[OpenAFS-devel] Re: sanity check: client vldb cache time

Andrew Deason adeason@sinenomine.net
Thu, 23 Jan 2014 14:54:36 -0600


On Thu, 23 Jan 2014 14:10:05 -0500
Jeffrey Hutzelman <jhutz@cmu.edu> wrote:

> On Wed, 2014-01-22 at 16:02 -0600, Andrew Deason wrote:
> 
> > Okay, that makes sense as to where the 2 hours is coming from. But
> > just to be clear, I believe in the past it's been a problematic
> > oversimplification, since that figure has been used to describe
> > evicting VLDB entries in general. There are other things besides
> > just the name->id mapping, and many relevant questions have indeed
> > involved things like moving volumes around. I think that's even the
> > most common form of the VLDB cache time being relevant; sometimes
> > renamings/renumberings do happen, but that's rarer.
> 
> It's relevant for renaming because as an admin, renaming has to be
> done in multiple steps with 2-hour windows between.

I just meant, renaming events are rarer than moving volumes around, at
least in terms of reasons why I get asked about VLDB caching. That's my
impression, anyway.

> > Also, just from looking at the code, I can't actually see where the
> > volcache entry is invalided for RW volumes. Both of the cases I
> > mentioned (afs_CheckCallbacks -> afs_ResetVolumeInfo, and
> > afs_CheckVolumeNames -> afs_ResetVolumeInfo), are only for RO volumes.
> 
> No, you're right; mere expiry doesn't cause non-RW entries to be
> rechecked, ever.

I feel like there's maybe a typo or something here, but I'm not sure
where. It seems like you're agreeing with me, but that last part isn't
what I said :)

The expiry of the volcache entry should cause the entry to be looked up
again if we ever need to hit the fileserver for something (since we'll
look up the vol struct to get the list of servers, fail, and call the
VLDB to get the info). I'm not sure if you're saying something about how
expiring the RW doesn't affect the non-RW, or if you're just saying
"yeah, this doesn't work for RWs", or ...?

> > From that, it seems possible that the RW information can be cached
> > forever, even if we can't contact the server; I'll have to
> > experiment with it, but that might explain some confusing behavior I
> > had seen reported before.
> 
> Indeed, there does not appear to be any mechanism by which a cached
> VLDB entry is invalidated if the volume is moved and the server goes
> down before

That, or the fileserver is renumbered. Moving volumes has the easy fix
of leaving the old server up for at least 2 hours (and according to this
thread, maybe that recommendation should be 4 hours now). But
renumbering a fileserver seems harder, since you'd need to setup an
'empty' fileserver or something on the old address to force clients to
re-lookup volume info.

> I'm inclined to think that when a server goes down, something (either
> afs_MarkServerUpOrDown or CkSrv_MarkUpDown) should call afs_ResetVolumes
> for that server, so any VLDB entries get invalidated.

The last time I suggested something like this, there was concern that
doing so could cause too much VLDB load when the server is just
legitimately down:
<http://thread.gmane.org/gmane.comp.file-systems.openafs.general/29735/focus=29737>
That does seem like a legitimate concern, since we'd potentially cause a
request for every single volume on the server. But it might be possible
to mitigate that by just checking the VLDB once per down event, and if
the volume is still where we expect it to be, we assume the server is
just down. I'm not even beginning to think of how to implement it,
though, ugh.

Another way of helping with this would be to allow the fileserver to
signal clients when it's shutting down. Either an ICBS3 (very
heavyweight for DAFS), or potentially a new RXAFSCB RPC. I don't mean
doing this for every shutdown, just ones where the administrator
explicitly somehow says "notify all the clients".

I've had an administrator request such functionality for other reasons,
and it seems like it might help here.

> I'm also inclined to think that we ought to expire even volcache
> entries for RW volumes, so they get looked up again once in a while.

Yes, I really thought we did already do that, and I've probably told
some people some incorrect information because of it. Sorry, all of
those people :)

-- 
Andrew Deason
adeason@sinenomine.net