[OpenAFS-devel] Re: bos killed fileserver before it was shut down cleanly.

Steve Simmons scs@umich.edu
Wed, 13 Oct 2010 16:34:17 -0400


On Oct 13, 2010, at 11:52 AM, Andrew Deason wrote:

> On Wed, 13 Oct 2010 11:47:43 -0400
> Steve Simmons <scs@umich.edu> wrote:
>=20
>> On Oct 12, 2010, at 1:36 PM, Andrew Deason wrote:
>>=20
>>> Is fssync-debug close enough for you? rxdebug isn't high-level
>>> enough to know about volumes, and the problem with the things that
>>> _do_ understand volumes typically want VOL_LOCK grabbed to
>>> introspect vol package status. So if we are hanging on shutdown
>>> because something is grabbing VOL_LOCK and won't let go, the debug
>>> command will hang.
>>=20
>> Actually I'm completely neutral about the mechanism.
>=20
> I was asking that because fssync-debug is not directly accessible over
> the net. Does that matter to you?

Ouch! Sorry, didn't think it thru. No, I don't think it matters. Worst =
case I can write a remctl-based invocation to get it from the servers. =
And that's not a very bad worst case.

And for that matter, we could do a remctl-ish thing today to tail the =
fileserver logs. But having timestamps and volume counts is nice and =
makes a semi-good approximation of progress speed and completion time =
estimates. That kind of stuff is hard to grok from just looking at the =
fs log.

IMHO is more important piece is something that ultimately allows bos to =
make smarter decisions about fs processes that hang during shutdown.=