[OpenAFS] bos killed fileserver before it was shut down cleanly.

Russ Allbery rra@stanford.edu
Sat, 09 Oct 2010 10:35:19 -0700


Anders Magnusson <ragge@ltu.se> writes:

> I noticed an annoying thing yesterday; if fileserver takes more than
> 30*60 seconds to shutdown, it is killed by bos, even though it is still
> offlining volumes.  (more annoying; fileserver fails to handle SIGKILL
> correctly and segfaults as a side effect).

> This is for 1.4.12.1, I haven't looked at 1.5, but I do not think it
> ever should force fileserver to die while it's doing it's work.  No idea
> how to implement this though without a major rewrite.

The problem is that it's also not uncommon for the fileserver to
completely or nearly completely stall when shutting down, so unless bos
kills it your fileserver is going to be down for hours and hours.  That's
the reason for the eventual kill.  At some point, it becomes faster to
salvage than to wait for the fileserver.

I could certainly see making the timeout an option, though, so you can
choose not to ever kill your fileserver if you want to manage that
manually.

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>