[OpenAFS-devel] Re: bos killed fileserver before it was shut down cleanly.

Andrew Deason adeason@sinenomine.net
Tue, 12 Oct 2010 12:36:59 -0500


Moving to -devel from -info, as mentioned.

Context: fileservers with lots of volumes / lots of hard-to-reach
clients take a long time to shut down, and bosserver SIGKILLs them after
30 minutes. This is annoying.

On Tue, 12 Oct 2010 13:17:12 -0400
Steve Simmons <scs@umich.edu> wrote:

> As an interim step, I'd love to see an extension to the type of data
> one could get via, say, rxdebug and friends. It would be really nice
> to interrogate a file server remotely and ask it the fs processes
> their state. Useful responses would be (handwave, handwave)
> "timestamp, running, N fileops", "timestamp, shutting down, N of M
> volumes disconnected", and so forth. Then us humans could watch the
> shutdown process without having to log onto each file server and tail
> the logfile. If a fs is responding to the queries yet hung, we can see
> it and take immediately response.

Is fssync-debug close enough for you? rxdebug isn't high-level enough to
know about volumes, and the problem with the things that _do_ understand
volumes typically want VOL_LOCK grabbed to introspect vol package
status. So if we are hanging on shutdown because something is grabbing
VOL_LOCK and won't let go, the debug command will hang.

Of course, that can be fixed. It's just annoying that FSSYNC handlers
are structured to always grab VOL_LOCK before doing anything, and FSSYNC
is where I'd want to put this.

And adding support for this in bosserver's timeout decisions means
adding an FSSYNC client to bosserver. I don't find that ideal, but,
well, we need some kind of communication between them for something like
this to work. Or, as I've mentioned before, if the timeout code is just
added to the fileserver itself, this isn't a problem.

But with that said, I may have something soonish to deal with the
problem of waiting for clients. If that problem is at least somewhat
dealt with, perhaps the rest of this is unnecessary?

-- 
Andrew Deason
adeason@sinenomine.net