[OpenAFS-devel] Cache inconsistency in client 1.4.8 and above
Felix Frank
Felix.Frank@Desy.de
Thu, 7 May 2009 07:51:29 +0200 (CEST)
On Wed, 6 May 2009, Simon Wilkinson wrote:
>
> On 5 May 2009, at 13:43, Felix Frank wrote:
>>
>> The patches in RT are just variations on the theme of
>> linux-mmap-antirecursion-20081020. They prevent deadlock at the risk of
>> data loss. The fixes in RT solve a cache inconsistency, but data corruption
>> is still possible.
>
> Just trying to clarify where we're at with this problem, as I know that there
> are people who get worried whenever they hear the words "data loss" (and I'm
> one of them!)
>
> My understanding is that one class of problems is solved by fixing
> linux-mmap-antirecursion-20081020 with the latest patch in RT. This solves
> the deadlock, and removes one set of write corruption issues. So far this
> corruption has only been observed with applications that mmap a file, close
> it, and then write to the mmap'd chunk. Does this match with your testing?
Yes, that's what the fixes in RT solve.
> Secondly, we have another issue that occurs with mmap when the file of the
> size being mmap'd is larger than the cache size. This has also only been
> observed where an application does mmap, close, write. This problem is
> currently unfixed, but has only been observed with Linux kernels that don't
> have the BDI starvation fixes. Is that a valid summary?
Exactly (almost), but for this to work, the file needs not even be closed
prior to writing.
This is somewhat embarrassing, but I just had a sudden idea to try another
variation on the test program. I will speak up again after doing more tests,
but this problem seems to be even more edge case than was originally assumed.
Regards
- Felix