[OpenAFS-devel] decreasing timeout delays for afs recovery

Neulinger, Nathan nneul@umr.edu
Tue, 29 May 2001 15:32:19 -0500


Has anyone done any tuning with significantly decreasing the timeouts for
failure of afs accesses? In the current state - for systems such as a web
server serving data out of AFS - by the time the box recovers from an afs
server going away - it's too late, cause it's already hung the box with
hundreds of requests. (This is really bad in the case of CGI - when you've
got a few hundred perl processes hung.)

Basically, I'd like to see something that would decrease the delay
in-between a server going down, or becoming inaccessible, and the client
giving up on the request so it can handle other activities to servers that
haven't failed. 

This also impacts use of replicates. Right now, our replicates are almost
useless, cause the servers are toast by the time the clients on those
servers have timed out and decided they couldn't talk to the fileserver they
were trying to talk to.

In the case of volumes with replicates - I'd almost like to see the clients
time out after 10-15 seconds. Maybe a bit longer, but definately not the few
minutes that it seems to take at times currently.

Any thoughts?

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-4841
Computing Services                       Fax: (573) 341-4216