[OpenAFS-devel] Re: sanity check: client vldb cache time

Thu, 23 Jan 2014 14:10:05 -0500

On Wed, 2014-01-22 at 16:02 -0600, Andrew Deason wrote:

> Okay, that makes sense as to where the 2 hours is coming from. But just
> to be clear, I believe in the past it's been a problematic
> oversimplification, since that figure has been used to describe evicting
> VLDB entries in general. There are other things besides just the
> name->id mapping, and many relevant questions have indeed involved
> things like moving volumes around. I think that's even the most common
> form of the VLDB cache time being relevant; sometimes
> renamings/renumberings do happen, but that's rarer.

It's relevant for renaming because as an admin, renaming has to be done
in multiple steps with 2-hour windows between.

> Also, just from looking at the code, I can't actually see where the
> volcache entry is invalided for RW volumes. Both of the cases I
> mentioned (afs_CheckCallbacks -> afs_ResetVolumeInfo, and
> afs_CheckVolumeNames -> afs_ResetVolumeInfo), are only for RO volumes.

No, you're right; mere expiry doesn't cause non-RW entries to be
rechecked, ever.  However, if the volume is actually moved, the old
server will start returning VMOVED or VNOVOL.  These codes (or VOFFLINE
or VSALVAGE) cause afs_Analyze to fetch a new VLDB entry immediately,
update the cache if it is out of date, and try some server other than
the one that just returned the error.

> From that, it seems possible that the RW information can be cached
> forever, even if we can't contact the server; I'll have to experiment
> with it, but that might explain some confusing behavior I had seen
> reported before.

Indeed, there does not appear to be any mechanism by which a cached VLDB
entry is invalidated if the volume is moved and the server goes down
before some attempt to access that volume returns VOFFLINE or VMOVED or
VNOVOL.  Well, other than manually invalidating all volcache entries,
which _does_ happen if someone invokes VIOCCKBACK (aka fs checkv).

I'm inclined to think that when a server goes down, something (either
afs_MarkServerUpOrDown or CkSrv_MarkUpDown) should call afs_ResetVolumes
for that server, so any VLDB entries get invalidated.

I'm also inclined to think that we ought to expire even volcache entries
for RW volumes, so they get looked up again once in a while.

-- Jeff