[OpenAFS-devel] Re: sanity check: client vldb cache time

Andrew Deason adeason@sinenomine.net
Fri, 24 Jan 2014 16:02:50 -0600


On Fri, 24 Jan 2014 16:34:55 -0500
Jeffrey Hutzelman <jhutz@cmu.edu> wrote:

> The only sane solution to that problem is for volcache entries to
> expire.

Yes, I think that's best, at least until some place has some specific
requirement for something further. Sites haven't really been complaining
much even about the current behavior (which _never_ recovers under
certain scenarios), so going beyond the simple time expiry is probably
not worth it.

> > Either an ICBS3 (very heavyweight for DAFS), or potentially a new
> > RXAFSCB RPC. I don't mean doing this for every shutdown, just ones
> > where the administrator explicitly somehow says "notify all the
> > clients".
> 
> The only problem with notifying on shutdown is you can wait a long
> time for clients that are unreachable, and the notification only goes
> to clients the fileserver hasn't forgotten about.

While I'm not going to be doing anything for this right now, I do expect
to be in the next year or so. So just to talk about it a little:

Waiting for unreachable clients is not so bad if we can multi-call them
in large batches. There was an improvement to rx_multi on master that
(at least allegedly) makes it feasible to do this in much larger
batches, possibly on the order of every single client host at once (or
at least ~10k, something like that). There's still some delay there, but
I think that's up to the administrator; the delay may be okay for you.

As for client hosts the fileserver hasn't forgotten about, that does
take 2 hours (excepting rarer cases like where the fileserver runs out
of callback space). If the client is capped at ~2 hours after last
contacting the fileserver for caching a VLDB entry, that seems like we'd
catch almost every client.

It's not guaranteed to catch everyone, but I think the intention here is
just to make it likely, as oppposed to now where every client would
definitely be broken for a significant amount of time (unless you take
the manual steps to work around it).

-- 
Andrew Deason
adeason@sinenomine.net