[OpenAFS] FS exited on signal 6

Jeffrey Hutzelman jhutz@cmu.edu
Mon, 11 Oct 2004 10:42:55 -0400

On Sunday, October 10, 2004 23:11:51 -0400 Derrick J Brashear 
<shadow@dementia.org> wrote:

> On Mon, 11 Oct 2004, Matthew Cocker wrote:
>> And why is a signal 6 entry in boslog followed by automatic salvage and
>> restart of the FS

Signal 6 is SIGABRT.  A process that receives this signal dies, unless it 
has made special provisions to do otherwise (which would be pointless in 
this case, since the reason for the SIGABRT is that some internal 
inconsistency was detected).  Under normal circumstances, such a process 
will also drop a core file; however, a resource limit may be in effect 
which prevents that.

The automatic salvage happens because abnormal termination of the 
fileserver always results in an automatic salvage.  This is because after 
an abnormal termination, things may be in an inconsistent state.  It is 
equivalent to automatically running fsck on a filesystem that was not 
unmounted cleanly.

The automatic restart happens because that's what the bosserver is for.

>> , nearly always followed within 24 hours with another
>> total  lock up where as manually started salvages seem to keep things
>> happier for  longer?
> luck?

Yes, this is likely entirely luck, or else the correlation is not a strong 
as is suggested.  It is pretty likely that running the salvager is not 
relevant, and it's actually the fileserver restart that has an effect 
(remember, a whole-partition or whole-server salvage requires shutting down 
the fileserver for the duration).

-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
   Sr. Research Systems Programmer
   School of Computer Science - Research Computing Facility
   Carnegie Mellon University - Pittsburgh, PA