[OpenAFS] Re: bos killed fileserver before it was shut down cleanly.

Tue, 12 Oct 2010 13:17:12 -0400

On Oct 10, 2010, at 3:36 PM, Russ Allbery wrote:

> Adam Megacz <adam@megacz.com> writes:
>> Russ Allbery <rra@stanford.edu> writes:
>=20
>>> The problem is that it's also not uncommon for the fileserver to
>>> completely or nearly completely stall when shutting down,
>=20
>> Just curious, is this "stall" a bug in the fileserver, or something
>> which happens for a good reason?  If so, what is the reason?
>=20
> It happens, in my experience, when there are hundreds of thousands of =
open
> callbacks, often to hosts behind NAT that are now unreachable and =
produce
> UDP timeouts.  The fileserver tries to break all those callbacks, =
which if
> left to run to completion can take many hours.

What Russ said. At umich we have lots of volumes on each vice partition, =
and breaking all the callbacks was a slow process. It was very, very =
visible in the fileserver logs. Since we had a high degree of trust in =
the fileservers being OK, we increased the timeout before bos gives up =
and does a hard kill. This let us get clean fileserver shutdowns and =
drastically improved the speed on startup by removing the need to =
salvage.

Initially we increased the timer to 120 minutes as we were seeing =
shutdown times as long as 75 minutes. Since then we've installed a new =
generation of hardware and moved from afs 1.4.<small> to 1.4.12. Between =
those two changes, the time required for a fileserver to break the =
connections has dropped quite a bit. I'll have some hard numbers for our =
most recent cell restart in a bit.

Conversely, when a fs is well and truly hung, waiting 30*60 seconds for =
bos to 'get it' is waaay too long. If there was some simple way bos =
could interrogate the fs and see progress (or non-progress), a much =
sorter timer could be used. But I've never had my head in that code and =
can't speak to the difficulty.

As an interim step, I'd love to see an extension to the type of data one =
could get via, say, rxdebug and friends. It would be really nice to =
interrogate a file server remotely and ask it the fs processes their =
state. Useful responses would be (handwave, handwave) "timestamp, =
running, N fileops", "timestamp, shutting down, N of M volumes =
disconnected", and so forth. Then us humans could watch the shutdown =
process without having to log onto each file server and tail the =
logfile. If a fs is responding to the queries yet hung, we can see it =
and take immediately response.

In the longer term, bos could query the fs for that data and make the =
same decision.

There's a lot more could be said on this topic, but that's probably meat =
for afs-dev rather than afs-info.

Steve

=20=