[OpenAFS] Transarc AFS 3.6 2.5 server release AAARRRGGGHHH!!!!

Nathan Neulinger nneul@umr.edu
Mon, 18 Dec 2000 18:26:27 -0600


Today we did a major AFS server upgrade that has been delayed for far
too long. Unfortunately thanks to an (apparently, I might have missed
it, but it sucks regardless) undocumented new feature in the 2.5 patch
release - we wound up having problems for about 8 hours with NT clients
unable to talk to the server.

Apparently the 2.5 code has a nwe delta in it that causes servers to
stop talking to clients completely when it thinks that they are flooding
the server with requests. There is no way to tell that this is
happening, no way to shut it off, and no way to get a list of affected
stations.

In our case, almost 1500 NT stations with the AFS client had extremely
sporadic and unstable access to afs. What would happen is - the fs
checks output woult include all of the servers running 2.5, or some
selection of them.

So - if you're thinking of upgrading to 3.6 2.5, think twice, or at
least be very cautious about it. Shutting down all your clients ahead of
time, and SLOWLY bringing them back up might help, but that's hardly an
option with 1500 stations.

The end result - we wound up backing down to 3.6 2.3 (a huge upgrade
from the 3.4a 5.53 we were running), which fixed the problem.

(To openafs gatekeepers - if you get a delta from transarc/ibm [yeah
right!] that includes this, I _STRONGLY_ suggest that you say 'no
thanks!' or at the very least, make the feature optional.)

BTW - this doesn't seem to affect unix clients. It only affected the NT
clients, runnig 3.5 or 3.6, didn't seem to matter which, although the
particular behavior differed with 3.5 and 3.6.

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-4841
CIS - Systems Programming                Fax: (573) 341-4216