[OpenAFS] Bos server failure

Jeffrey Hutzelman jhutz@cmu.edu
Tue, 16 Nov 2004 00:36:15 -0500


On Monday, November 08, 2004 09:49:32 -0500 Steve Devine <sdevine@msu.edu> 
wrote:

> All,
> Odd problem here.
> I have a file server where the bos server has stopped. It still seems to
> be serving files ok. vlserver is not running either. The last time this
> happened I restarted bosserver and this started a chain reaction of
> Salvaging that never actually stopped. The best solution is a reboot but
> if the server will hold off till tonight I would like to wait untill then
> .. classes are in full swing right now.
>
> SO here is my question .. can I remove SALVAGE.fs and start bosserver
> thereby avoiding the salvage routine? Or can I kill the fileserver
> processs one by one and then restart bosserver?
>
> Or is this an invitation to more headaches.?

Don't do that.  If the bosserver is dead and you try to start a new one, 
then it will not know about the other servers that are still running, and 
will attempt to start new ones.  This is why you were getting the "endless 
salvage" - the fileserver would start up, find out that its port is 
unavailable, and exit.

The safest course of action is to manually kill off any of the remaining 
servers that normally run under the bosserver's control, then restart the 
bosserver (possibly after a reboot).  As long as the other services are 
still working, it is safe to wait arbitrarily long before doing this.

Note that while most of the servers will shut down cleanly if sent SIGTERM, 
the fileserver is special and needs to be sent SIGQUIT to make it shut down.

-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
   Sr. Research Systems Programmer
   School of Computer Science - Research Computing Facility
   Carnegie Mellon University - Pittsburgh, PA