[OpenAFS] AFS server hangs after weekly bos restart

Eric Chris Garrison ecgarris@iupui.edu
Wed, 10 Dec 2008 06:16:17 -0500


A couple of months ago, I upgraded our OpenAFS servers to 1.4.7.   Three 
weeks ago, a problem where the main metadata server (1st of 3) would 
stop responding to AFS requests properly and within a couple of hours, 
all clients become unable to get files, vos commands stop responding, 
etc.   If the machine is rebooted, the problem goes away until the next 
restart.  Just restarting openafs-server does not fix the problem, however.

Oddly, when I did a manual "bos restart <server> -all" it didn't 
reproduce the problem.   I was thinking that this meant the problem 
wasn't the bos restart at all... but when I changed the day on which the 
bos restart happened, the problem changed days with it.

Sorry for the vagueness, but no one has been online to observe this 
starting, we're just doing forensics on the aftermath.

I'd appreciate any suggestions on why this might be happening and things 
to check.

Thank you,

Eric Chris Garrison             | Principal Mass Storage Specialist
ecgarris@iupui.edu <mailto:ecgarris@iupui.edu>              | Indiana 
University - Research Storage <mailto:ecgarris@iupui.edu>