[OpenAFS] Re: bos killed fileserver before it was shut down cleanly.

Sat, 9 Oct 2010 14:00:32 -0500

On Sat, 09 Oct 2010 10:35:19 -0700
Russ Allbery <rra@stanford.edu> wrote:

> Anders Magnusson <ragge@ltu.se> writes:
> 
> > I noticed an annoying thing yesterday; if fileserver takes more than
> > 30*60 seconds to shutdown, it is killed by bos, even though it is still
> > offlining volumes.  (more annoying; fileserver fails to handle SIGKILL
> > correctly and segfaults as a side effect).

Erm, we shouldn't be able to handle KILL at all, let alone incorrectly.
If you have a core that says it died from SEGV, can you give the
backtrace?

> > This is for 1.4.12.1, I haven't looked at 1.5, but I do not think it
> > ever should force fileserver to die while it's doing it's work.  No
> > idea how to implement this though without a major rewrite.
> 
> The problem is that it's also not uncommon for the fileserver to
> completely or nearly completely stall when shutting down, so unless
> bos kills it your fileserver is going to be down for hours and hours.
> That's the reason for the eventual kill.  At some point, it becomes
> faster to salvage than to wait for the fileserver.

If we had a way for the fileserver to communicate some kind of keepalive
or estimate on the number of attached volumes, this could be avoided.
But "offlining volumes" and "deadlocked" are indistinguishable from the
bosserver's point of view, and the bosserver has no idea how long it
should take. 1.5 still has this, but if you use DAFS it's not as much of
a problem.

If you want a workaround, you can send the fileserver a SIGQUIT
yourself if you want to restart the fileserver. If you want to stop it,
you may be able to switch the binary with a shell script that sleeps and
then SIGQUIT it, or something similar. Not that I'm recommending that,
but it's possible :)

> I could certainly see making the timeout an option, though, so you can
> choose not to ever kill your fileserver if you want to manage that
> manually.

If you want to change it for now, you can just alter FSSDTIME in
src/bozo/bnode.p.h and build.

An option I would think would be a new optional field in BosConfig...
obviously new 'bos' support for such a thing would take longer for the
RPCs. It may make sense to try to move this to the fileserver itself,
though, since we have more knowledge to estimate how long a shutdown
should take. It can be very simple and unlikely to fail or get stuck:
just create a thread, sleep for N seconds and assert. We already do this
for ShutDownAndCore(PANIC) calls.

-- 
Andrew Deason
adeason@sinenomine.net