[OpenAFS] Bos server failure
Steve Devine
sdevine@msu.edu
Tue, 16 Nov 2004 09:46:26 -0500
Jeffrey Hutzelman wrote:
>
>
> On Monday, November 08, 2004 09:49:32 -0500 Steve Devine
> <sdevine@msu.edu> wrote:
>
>> All,
>> Odd problem here.
>> I have a file server where the bos server has stopped. It still seems to
>> be serving files ok. vlserver is not running either. The last time this
>> happened I restarted bosserver and this started a chain reaction of
>> Salvaging that never actually stopped. The best solution is a reboot but
>> if the server will hold off till tonight I would like to wait untill
>> then
>> .. classes are in full swing right now.
>>
>> SO here is my question .. can I remove SALVAGE.fs and start bosserver
>> thereby avoiding the salvage routine? Or can I kill the fileserver
>> processs one by one and then restart bosserver?
>>
>> Or is this an invitation to more headaches.?
>
>
> Don't do that. If the bosserver is dead and you try to start a new
> one, then it will not know about the other servers that are still
> running, and will attempt to start new ones. This is why you were
> getting the "endless salvage" - the fileserver would start up, find
> out that its port is unavailable, and exit.
>
> The safest course of action is to manually kill off any of the
> remaining servers that normally run under the bosserver's control,
> then restart the bosserver (possibly after a reboot). As long as the
> other services are still working, it is safe to wait arbitrarily long
> before doing this.
>
> Note that while most of the servers will shut down cleanly if sent
> SIGTERM, the fileserver is special and needs to be sent SIGQUIT to
> make it shut down.
>
> -- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
> Sr. Research Systems Programmer
> School of Computer Science - Research Computing Facility
> Carnegie Mellon University - Pittsburgh, PA
Jeffrey,
Thanks for the reply .. that is exactly what I did. I waited till the
next morning and rebooted the servers at 5 am . They all had to run
salvager but they did return. I also upgraded them to Version 1.2.13 and
so now we are in the 'wait and see' mode. I also set the startup script
to allow core files so if they do fail we will have a little more to
work with.
/sd
--
Steve Devine
Storage Systems
Academic Computing & Network Services
Michigan State University
301 Computer Center
East Lansing, MI 48824-1042
1-517-355-4500 (x242)
Baseball is ninety percent mental; the other half is physical.
- Yogi Berra