[OpenAFS-devel] Re: bos killed fileserver before it was shut down cleanly.
Steve Simmons
scs@umich.edu
Wed, 13 Oct 2010 11:47:43 -0400
On Oct 12, 2010, at 1:36 PM, Andrew Deason wrote:
> Moving to -devel from -info, as mentioned.
>=20
> Context: fileservers with lots of volumes / lots of hard-to-reach
> clients take a long time to shut down, and bosserver SIGKILLs them =
after
> 30 minutes. This is annoying.
>=20
> On Tue, 12 Oct 2010 13:17:12 -0400
> Steve Simmons <scs@umich.edu> wrote:
>=20
>> As an interim step, I'd love to see an extension to the type of data
>> one could get via, say, rxdebug and friends. It would be really nice
>> to interrogate a file server remotely and ask it the fs processes
>> their state. Useful responses would be (handwave, handwave)
>> "timestamp, running, N fileops", "timestamp, shutting down, N of M
>> volumes disconnected", and so forth. Then us humans could watch the
>> shutdown process without having to log onto each file server and tail
>> the logfile. If a fs is responding to the queries yet hung, we can =
see
>> it and take immediately response.
>=20
> Is fssync-debug close enough for you? rxdebug isn't high-level enough =
to
> know about volumes, and the problem with the things that _do_ =
understand
> volumes typically want VOL_LOCK grabbed to introspect vol package
> status. So if we are hanging on shutdown because something is grabbing
> VOL_LOCK and won't let go, the debug command will hang.
Actually I'm completely neutral about the mechanism. You mention =
VOL_LOCK issues; it's a good example of *why* I'm neutral. There are =
good architectural reasons not to do x, y or z in AFS, and I don't see =
the utility gained from remote monitoring of a shutdown in progress =
being worth heavy mods to the internals of fs, etc. So lets take the =
easiest course.=