[OpenAFS] Qu re tuning timeouts for failover between RO replicas

Thomas M. Payerle payerle@umd.edu
Thu, 2 Dec 2010 13:56:07 -0500 (EST)


I am looking for a way to tune the timeout before failing over to another
AFS server for replicated volumes, but cannot seem to find any suitable
runtime parameters to tweak.  Do any such parameters exist?

We have some replicated web servers serving data from replicated RO volumes.
If one of the servers hosting one of those volumes goes down, httpds which
were pointing to that server's copy of the volume seem to get badly wedged.
I think it is because enough requests come in during the time it takes
for AFS client on web host to release the AFS server is down and move on to
a replica that all available threads for apache are used, and apache just
gets very unhappy.

I would like to reduce that timeout if possible; all traffic should be going
over fairly wide pipes inside or between our local data centers, so expect
we can safely lower the timeouts.

Is this doable w/out recompiling AFS?  running 1.4.11.

Tom Payerle
OIT-TSS-DCS				payerle@umd.edu
University of Maryland			(301) 405-6135
College Park, MD 20742-4111