[OpenAFS-devel] Re: bos killed fileserver before it was shut down cleanly.

Jeffrey Hutzelman jhutz@cmu.edu
Tue, 12 Oct 2010 13:58:43 -0400


--On Tuesday, October 12, 2010 12:36:59 PM -0500 Andrew Deason 
<adeason@sinenomine.net> wrote:

> rxdebug isn't high-level enough to
> know about volumes

Nor should it be.  rxdebug's job is to debug rx, period.

> Of course, that can be fixed. It's just annoying that FSSYNC handlers
> are structured to always grab VOL_LOCK before doing anything, and FSSYNC
> is where I'd want to put this.

My first thought was the same -- that this sort of thing belongs in FSSYNC. 
And certainly, that would be a reasonable way to provide a progress 
indicator to the bosserver.

However, for the "human admins want to see what's going on" problem, 
perhaps an RPC interface is better.  It should be a separate Rx service 
(though probably on the same port), and have at least one dedicated thread. 
And for introspection, it may want to completely ignore locks and risk 
giving out bogus data rather than risking deadlock.

> And adding support for this in bosserver's timeout decisions means
> adding an FSSYNC client to bosserver. I don't find that ideal, but,
> well, we need some kind of communication between them for something like
> this to work.

You could have the fileserver send periodic signals to its parent while 
shutting down.  Or, provide for an environment variable containing the 
number of a file descriptor over which periodic heartbeats should be sent.


> Or, as I've mentioned before, if the timeout code is just
> added to the fileserver itself, this isn't a problem.

No; the idea is to KILL KILL KILL the fileserver (or any other server) if 
it doesn't shut down in a reasonable time.  That has to be done outside; a 
process that is hung isn't going to kill itself.

-- Jeff