[OpenAFS] Re: proper way to bring down a file server?

Andrew Deason adeason@sinenomine.net
Thu, 24 Feb 2011 10:43:26 -0600


On Thu, 24 Feb 2011 08:40:13 +0100
Derrick Brashear <shadow@dementia.org> wrote:

> > Regardless of whatever bugs on the fileserver may be in play,
> > clients should indeed issue a new query on VOFFLINE.
> 
> bugs on the fileserver? how about 'none'?

In this case, yeah, I assume so; but Jeff is correct that there have
been bugs where VOFFLINE was reported when it should not have been. I'm
just saying "even if that were not the case..."

> > A VOFFLINE error can be the result of incorrect/stale volume
> > location information (if a volume is offline on one server but
> > online another), and so the current information should be looked up
> > when it occurs.
> 
> huge waste of RPCs for a legitimate operating condition albeit an
> undesirable one. you'll create a vldb storm if a 'popular' volume goes
> offline.

"Huge". Unix clients have been doing it ~forever, and the number of
places I have ever heard of even noticing a vlserver load I can probably
count on one hand.

> >> In a similar vein, if the file server is inaccessible, the client
> >> does not issue a new VLDB query.
> > 
> > ...this is intentional? Why doesn't it? We could be contacting the
> > wrong server because we have stale location information.
> 
> We could. But that's basically true of any error, and if we run to
> mommy on every error, eventually mommy can't handle us being so pesky
> and melts down

Some are still a lot more likely to have "stale location" to be the
cause than others. The probability of RX_CALL_DEAD being so I suppose is
rather small, as it only happens in this "move and shutdown" scenario,
and leaving the server on isn't too hard. Of course, such is not always
under the control of the administrator, but eh.

-- 
Andrew Deason
adeason@sinenomine.net