[OpenAFS-devel] sanity check: client vldb cache time

Wed, 22 Jan 2014 11:59:55 -0600

For as long as I can remember, whenever someone asks how long the client
caches VLDB information, the answer given is that we just cache it for a
fixed 2 hours interval. Recently I've realized I don't actually know
where in the code this happens, so I wasn't sure if I just forgot this
information, or if I never actually knew and was just trusting/parroting
what everyone said.

Recently, I've had reason to go looking around this area of code (that
Nicolas Prochazka thread on -info had me curious at first, but some
unrelated questions came up later that I needed to look into). As far as
I can tell, that answer of "2 hours" is not correct (or at best, a great
and possibly misleading oversimplification). I would appreciate some
sanity checks on this to make sure we're not telling people incorrect
information about the behavior here.

In the Windows client, it looks like we effectively flush an entry by
setting CM_VOLUMEFLAG_RESET. This is done on a expiration-time basis via
cm_RefreshVolumes() if the volume is 'lifetime' seconds old (for the
"normal" case). This is called via cm_Daemon every five minutes with a
lifetime of cm_daemonCheckVolInterval. cm_daemonCheckVolInterval
defaults to 3600 (1 hour), although it is changeable via the
daemonCheckVolInterval registry setting. Though of course there are
other conditions in which the vldb entry is reset, that seems to be the
one where we just base the decision on time, and not any errors or
anything.

So, it would appear as though VLDB entries are cached for only about 1
hour in the Windows world. Obviously I'm not very familiar with the
Windows code base, and I'm basing the above opinion on a brief look
through the code, but if that's wrong, I would appreciate being told
specifically how.

For the Unix client, this is a little more complex. We flush a VLDB
entry via afs_ResetVolumeInfo, which is called from afs_CheckVolumeNames
and afs_CheckCallbacks (for RO volume vcaches only). In both of these
situations, we reset the vldb entry based on the vp->expireTime
expiration time. Both are called via afs_Daemon; afs_CheckVolumeNames
every 10 minutes, and afs_CheckCallbacks every iteration of the loop (20
seconds or so).

vp->expireTime starts off at 0 when we allocate the volume struct (we
memset the whole thing to 0), and is extended to whatever our vcache
expiration time is in afs_QueueCallback (if the vcache expiration time
is later). So the max vldb cache time seems to be whatever the max
expiration time is for any file we access in the volume (for RO this is
2 hours, for RW it ranges between 7 minutes and 4 hours). It also seems
like this means we can keep a vldb entry cached indefinitely as long as
we keep successfully accessing files in the volume before the callback
promises expire.

This differs quite dramatically from what I expected from the "2 hours"
answer; I thought we would just set 'vp->expireTime = now + 7200;' or
something, but that doesn't seem to happen anywhere. For RO volumes this
does seem to have an effective "maximum time the vldb cache can be
wrong" of about 2 hours, but this isn't true of every situation, and of
course we can probe the vldb much less often than once every 2 hours.

-- 
Andrew Deason
adeason@sinenomine.net