[OpenAFS-devel] Cache inconsistency in client 1.4.8 and above

Felix Frank Felix.Frank@Desy.de
Thu, 7 May 2009 07:51:29 +0200 (CEST)


On Wed, 6 May 2009, Simon Wilkinson wrote:

>
> On 5 May 2009, at 13:43, Felix Frank wrote:
>> 
>> The patches in RT are just variations on the theme of 
>> linux-mmap-antirecursion-20081020. They prevent deadlock at the risk of 
>> data loss. The fixes in RT solve a cache inconsistency, but data corruption 
>> is still possible.
>
> Just trying to clarify where we're at with this problem, as I know that there 
> are people who get worried whenever they hear the words "data loss" (and I'm 
> one of them!)
>
> My understanding is that one class of problems is solved by fixing 
> linux-mmap-antirecursion-20081020 with the latest patch in RT. This solves 
> the deadlock, and removes one set of write corruption issues. So far this 
> corruption has only been observed with applications that mmap a file, close 
> it, and then write to the mmap'd chunk. Does this match with your testing?

Yes, that's what the fixes in RT solve.

> Secondly, we have another issue that occurs with mmap when the file of the 
> size being mmap'd is larger than the cache size. This has also only been 
> observed where an application does mmap, close, write. This problem is 
> currently unfixed, but has only been observed with Linux kernels that don't 
> have the BDI starvation fixes. Is that a valid summary?

Exactly (almost), but for this to work, the file needs not even be closed
prior to writing.

This is somewhat embarrassing, but I just had a sudden idea to try another
variation on the test program. I will speak up again after doing more tests,
but this problem seems to be even more edge case than was originally assumed.

Regards
  - Felix