[OpenAFS] Re: Possible cache corruption with linux client and 1.6.1 fileserver

Andrew Deason adeason@sinenomine.net
Wed, 14 Nov 2012 18:38:28 -0600

On Wed, 14 Nov 2012 14:04:38 -0500 (EST)
Richard Brittain <Richard.Brittain@dartmouth.edu> wrote:

> Someone just told me how to clear linux buffer cache using
> /proc/sys/vm/drop_caches, and doing that will 'fix' the file too - no
> need to re-read the file from the fileserver.  It looks like the
> problem is between the cache manager and the Linux buffer cache.  That
> is consistent with the V-files in the cache apparently containing the
> correct data.

Yes, that makes sense. Instead of keeping going back and forth and on
this, I just fiddled with it myself.

What appears to be happening is that when the file is empty, Linux asks
us to fill a page, which we do (fill it with zeroes). After the file is
written to and our callback is broken and we get another read, we flush
the pages for the file. However, there is what appears to be a speed
optimization in place such that we don't flush pages for file data for a
file with a dataversion of 0. I assume this is based on the assumption
that dv-0 files cannot have any data in them, so the idea is that
there's no page data to flush.

Linux seems to keep the empty page around, though. On subsequent reads,
I don't see a read request from Linux for the first page, so it seems to
be reusing the same page from when the file was empty. I assume this is
some paging behavior that is different for Linux than elsewhere, or
we're not doing something correct with the Linux page management.

Gerrit 8465 has something that fixed this for me, but I'm not sure if my
reasoning here is correct. I would hope some others that know more about
Linux internals (or how this situation normally works on other unixes)
would share some information about this.

Andrew Deason