[OpenAFS-devel] AFS dcache problem

Craig_Everhart@transarc.com Craig_Everhart@transarc.com
Tue, 22 Jan 2002 13:06:48 -0500 (EST)


Excerpts from transarc.external.openafs: 22-Jan-02 [OpenAFS-devel] AFS
dcache .. fabv@freegates.be (1691*)

> It works well except the dcache is not fully persistent across reboots. 
> The file are written in the disk cache when they are accessed (I can see 
> that the cache size increase with df or du (cache has its own 
> partition)).  When I stop AFS, they are still there.  But when I restart 
> AFS, it does its cache scan and 'discard' most of the files.  (the space 
> used on the cache partition decrease and when I try to read the files, 
> they are loaded through the network)  In fact, it discard about the 
> _last_ 30-50 Mb cached.

The design assumption for this is that if a file was written to the
cache within the last N seconds, all the data for the correct copy of
the file might not have been written from memory to the disk cache for
that file.  After all, this post-reboot code is correlating data in
files in the cache with the CacheItems file data.  AFS doesn't take the
time to fsync() the file data in the cache, so this is its way of
protecting itself against incomplete cache updates that are interrupted
by a machine crash.

Thus, by removing this check, you're risking AFS trusting file data that
might have been lost in a machine crash.  By doing a clean AFS shutdown,
you're avoiding the problem, in that the VM data for the cache files is
eventually written, or perhaps is still there when you start AFS again. 
But if it's a machine crash that has stopped AFS, you really want the N
second window so that you don't think that the freshly-created cache
files are valid.

Basically, this unconditional design was based on the idea that AFS is
not generally stopped by anything but a machine crash.  If you wanted to
have the CacheItems file record the clean shutdown and then alter the
startup scan, that would be possible code to add to the AFS shutdown and
startup procedures, I'm sure.  But please don't leave the code in your
AFS client because it will allow data corruption over a crash.

I really can't speak about the Volume==0 case.

		Craig