[OpenAFS-devel] Cache inconsistency in client 1.4.8 and above

Felix Frank Felix.Frank@Desy.de
Wed, 27 May 2009 09:16:04 +0200


Felix Frank wrote (Thu May 07 2009 08:52:41 GMT+0200 (CEST))
>>> Secondly, we have another issue that occurs with mmap when the file 
>>> of the size being mmap'd is larger than the cache size. This has also 
>>> only been observed where an application does mmap, close, write. This 
>>> problem is currently unfixed, but has only been observed with Linux 
>>> kernels that don't have the BDI starvation fixes. Is that a valid 
>>> summary?
>>
>> Exactly (almost), but for this to work, the file needs not even be closed
>> prior to writing.
>
> Attached is yet another version
> of the test program. It reproduces errors in Linux 2.6.18-128.1.6.el5xen 
> with
> 50 MB disk cache (test file is 120MB).
> 
> It will write using mmap, then unmap, then mmap again to read data back.
> When invoked with -a, the call posix_fadvise(fd, 0, SIZE, 
> POSIX_FADV_DONTNEED)
> prior to remapping is suppressed. I guess it tells the kernel to discard 
> any
> VM pages that hold data from the file of fd. So with -a, the program 
> runs fine
> even in AFS, but that's cheating: Running with -r afterwards reveals that
> some data has not got written to the cache after all.
> 
> So yes, with the mentioned kernel there is possible data loss with files 
> larger
> than the cache (although 100MB file vs 50MB cache seems to work).
> I'd be interested if that's reproduceable with newer kernels.

I finally got a hold of a box that I could setup with a different 
distro. I'm sad to report that I managed to reproduce the faulty 
behaviour on Linux 2.6.29-4.slh.1-sidux-amd64.

Regards
  - Felix