[OpenAFS-devel] Cache inconsistency in client 1.4.8 and above
Marc Dionne
marc.c.dionne@gmail.com
Mon, 4 May 2009 13:52:10 -0400
> Traces of the usual deadlocked suspects are attached. At that point, just
> about any process can deadlock, I suppose. Apparently, the system ceases to
> balance dirty pages (which appears plausible to me, but I have no experience
> with virtual memory implementations whatsoever).
Ok this brought back some memories... I think you're seeing a problem
with older kernels that was addressed by Peter Zijlstra's "per BDI
dirty threshold" patch set in kernel 2.6.24:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=04fbfdc14e5f4
Note the mention of "deadlocks with stacked BDIs", which is exactly
the case for AFS when using a disk cache. The congestion on the AFS
backing device keeps processes from writing to other devices,
including the ext2/3 device holding the disk cache. So the cache
manager can't make progress in writing back its dirty data.
See for instance: https://bugzilla.redhat.com/show_bug.cgi?id=453811 -
a request to backport the patch set to 2.6.18 for RHEL 5.
It may well be that there's no way to work around this kernel problem
in the AFS code.
Marc