[OpenAFS-devel] scary site-wide afs outage...

Neulinger, Nathan nneul@umr.edu
Fri, 26 Oct 2001 08:11:33 -0500


We just had a very bizarre outage last night with a symptom we have never
seen before... ALL of our file servers on a particular subnet quit
responding, even to themselves, and all at the same time (about 12:34pm
central). volserver and bos were both responding fine, and giving back
useful info, but the file servers were not answering.
xstat_fs_test/afsmonitor would get no answer from those file servers as
well. 

We wound up doing bos restarts on everything, which caused them to start
working again. File servers are mostly redhat62 on 2.2.19 running openafs
snapshot from back in april. 

We've never seen anything like this. As near as we can tell we had no other
odd things happen at that same time, just out of the blue, all of the file
servers up and stopped talking. 

Unfortunately, I don't really have any useful debug info, just figured I'd
pass this along in case anyone else might have seen anything like it. I'm
wondering if it may have been a DoS attack of some sort. The interesting
thing is - there were two servers in our test cell on that same subnet that
continued to respond, but no one ever accesses them. 

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-4841
Computing Services                       Fax: (573) 341-4216