[OpenAFS] fileserver crashes

Matthew Cocker matt@cs.auckland.ac.nz
Wed, 13 Oct 2004 11:11:10 +1300


I originally started this thread asking if there was anything special 
about signal 6 BosLog entries, which it seems was not what I needed to ask.

So here goes again

Since the 25th Sept we have had 10 out of 15 openafs 1.2.11 fileserver 
crash or otherwise stop serving files (some 4 or 5 times). This after 
running since beginning of the year with only about 2 other crashes total.

So I have started to look at what has changed since then and for any 
patterns in the logs (our unix admins are back today so we maybe able to 
get the core dumps Derrick has asked for).

Changes since Sept 1st

i) we have upgraded to windows 1.3.71 client in one of our locations 
(9/11 fs have had problems). In the second location they are running an 
early version of windows client and have had only 1/4 FS have problems, 
but they have a lot less clients there).

ii) we have had two broadcast storm events on the lab client networks 
(misconfigured switches) which took out large segments of the network. 
The server network was uneffected, but clients PCs definitely lost 
network. The afs fs failures seem to have happened a lot more after 
these events.

Apart from these changes/events I can not see anything else that has 
changed since start of the year.

Any help, suggestions would be welcome as we have been pushing for AFS 
to be the new central university student filesystem cause we have had 
such a good run with it. These crashes have chosen the worst possible 
time to start happening as we are in the final stages of the decision 
and the decision makers are demanding I explain what is happening. 
Unfortuantely at the moment I have little to offer them.