[OpenAFS-devel] Cache inconsistency in client 1.4.8 and above

Marc Dionne marc.c.dionne@gmail.com
Fri, 24 Apr 2009 11:24:29 -0400


On Tue, Apr 21, 2009 at 7:04 AM, Felix Frank <Felix.Frank@desy.de> wrote:
> There is yet another fix proposal in RT #124627. It works for me at least.
> Evidently, after applying fixes, deadlocks could still occur during the
> first invocation of osi_VM_StoreAllSegments, so
> linux-mmap-antirecursion-20081020 never really worked, I'm afraid.
> This current fix hopefully will.

So this might be a good time to summarize for everyone:
- the original anti-recursion patch has a bug that effectively
disabled osi_VM_StoreAllSegments completely - not what was intended,
and should be fixed.
- it turns out that the approach of that patch - setting the flag in
osi_VM_StoreAllSegments - can't be used as is since we still recurse a
second time into the writeback code, and we get deadlocks (both 2.6.18
for Felix and 2.6.30 here)
- Felix's approach is to set the flag in writepage, and to prevent
re-entry into either writepage or entry in osi_VM_StoreAllSegments for
the same file if it is set.  This looks sound.  The net effect differs
from Chaskiel's suggestion in that it 1) disables
osi_Vm_StoreAllSegments on the same file for callers other than
doPartialWrite (probably a good idea), and 2) prevents concurrent
writepage calls within the same file (which might already be the
case).

I've done quite a bit of mmap testing (2.6.29 and 2.6.30 only) with
the last version of Felix's patch in 124627, and from my point of view
the behaviour is equivalent to what I see with 1.4.10, and with no
deadlocks.

Issues that remain:
- I think Felix still sees some deadlocks and data inconsistencies
with 2.6.18, but I can't reproduce with 2.6.29 or 2.6.30
- I see extreme slowness with random mmap writes - nothing really new
here.  But Felix reports that he doesn't see this with his older
2.6.18 kernels, which is interesting.  We're probably doing something
that's not quite right for newer kernels.  Would be interesting to
bisect if I had a machine that could boot that range of kernels.

Marc