[OpenAFS-devel] Re: bos killed fileserver before it was shut down cleanly.

Steve Simmons scs@umich.edu
Wed, 13 Oct 2010 17:32:30 -0400


On Oct 12, 2010, at 2:32 PM, Andrew Deason wrote:

> On Tue, 12 Oct 2010 13:26:54 -0500
> Andrew Deason <adeason@sinenomine.net> wrote:
>=20
>> I'm not saying to also remove an external timeout in bosserver,
>> though.  Just that the fileserver itself could have a much
>> finger-grained timeout (adjusting for # of volumes, or the last
>> internal heartbeat, etc) with bosserver having a larger unconditional
>> one.
>=20
> Oh, and something else that was brought up in -info that could make =
the
> KILL behavior more tolerable: configurable timeouts. I assume that's =
not
> contentious? Just need to change the BosConfig format to allow for it.

I checked with Dan Hyde; our modification of the timeout period was done =
by hacking the code. Since going to our newer hardware and newer =
versions of oafs we're no longer getting bitten by the problem, and our =
latest install uses the original timeout period.

IMHO the basic parameter for this belongs in BosConfig. But it would =
also be very useful to enable an on-the-fly change, ie, you realize the =
timers going to go off in a few minutes and another 10 would see you =
thru to clean shutdown and no salvages. At first glance that'll become =
less important with demand attach, but on second glance demand attach =
might mean we want to make those times hugely shorter. Either way, an on =
the fly change capability would be good. I don't see anything in the bos =
manpages that allows for it to re-read the BosConfig file on the fly, =
but that kind of feature probably wants more general discussion anyway.

Separately, I could see a bos command to do this, somewhat like 'bos =
restarttime' and so forth. Such a command could be very specific to this =
timer, eg,

    bos fsshutdownwait -server <machine name>
           -time <seconds_to_wait>
           [-cell <cell name>] [-noauth] [-localauth] [-help]

Opening this can of worms might bring in requests for other dynamically =
resettable values in bos and elsewhere. A more general solution is

   bos setparam -server <machine name>
	-param <parametername> -value <newvalue>
	[-cell <cell name>] [-localauth] [-help]

As we come across more things that could be reset dynamically, we =
wouldn't have to change the man pages for bos, just refer readers to man =
BosConfig.

Ultimately things like 'bos setrestart' could be subsumed into it, ie, =
an equiv command would be

   bos setparam localhost -param setrestartime -value "16 0 0 0 0"

Similar syntax would apply for checkbintime, etc.

And if somebody gears up to touch that stuff, a useful switch pair would =
be

   [ -temp | -perm ]

where -perm causes the BosConfig file to be rewritten immediately and =
-temp means it is left alone. I'm in favor of anything that puts =
specific capabilities back into the hands of the admin.

If we wanted to dynamically change parameters for file servers, etc, =
we'd probably want a -type switch as well.