[OpenAFS-devel] Cache inconsistency in client 1.4.8 and above
Marc Dionne
marc.c.dionne@gmail.com
Thu, 16 Apr 2009 20:40:14 -0400
On 04/16/2009 08:25 AM, Felix Frank wrote:
>>> - if (!avc->states & CPageWrite)
I see a bug there - this line probably wants to be:
if (!(avc->states & CPageWrite))
So the recursion was avoided by never actually doing anything in
StoreAllSegments, since CPageWrite never got set and the condition was
always false.
With the fix above, my larger mmap test quickly runs into a deadlock
again. Looks like cache_write_pages is trying to lock the page that is
currently being written:
(this is pdflush):
[<ffffffffa0b91d14>] ? crfree+0x38/0x3c [libafs]
[<ffffffff81077f85>] ? getnstimeofday+0x5a/0xae
[<ffffffff810b2b0a>] ? sync_page+0x0/0x45
[<ffffffff8144c905>] schedule+0x9/0x1d
[<ffffffff8144c94c>] io_schedule+0x33/0x44
[<ffffffff810b2b4b>] sync_page+0x41/0x45
[<ffffffff8144cd0e>] __wait_on_bit_lock+0x41/0x8a
[<ffffffff810b2acf>] __lock_page+0x61/0x68
[<ffffffff8107144d>] ? wake_bit_function+0x0/0x2e
[<ffffffff810b863c>] write_cache_pages+0x1dc/0x3b3
[<ffffffff810b804a>] ? __writepage+0x0/0x2f
[<ffffffff810b8832>] generic_writepages+0x1f/0x21
[<ffffffff810b8863>] do_writepages+0x2f/0x37
[<ffffffff810b35e3>] __filemap_fdatawrite_range+0x4b/0x4d
[<ffffffff810b3d90>] filemap_fdatawrite+0x1a/0x1c
[<ffffffffa0b9485c>] osi_VM_StoreAllSegments+0xd7/0x17c [libafs]
[<ffffffffa0b5e000>] afs_StoreAllSegments+0xcb/0x17c7 [libafs]
[<ffffffff810dbc69>] ? __fput+0x17b/0x18a
[<ffffffff81077f85>] ? getnstimeofday+0x5a/0xae
[<ffffffff81077fee>] ? do_gettimeofday+0x15/0x38
[<ffffffffa0b99fdf>] ? afs_icl_Event4+0xfe/0x162 [libafs]
[<ffffffffa0b751ba>] afs_DoPartialWrite+0x55/0x5a [libafs]
[<ffffffffa0b97655>] afs_linux_writepage_sync+0x30f/0x3fc [libafs]
[<ffffffff8122156b>] ? prio_tree_next+0x1c3/0x224
[<ffffffffa0b97838>] afs_linux_writepage+0x8c/0xba [libafs]
[<ffffffff810b805c>] __writepage+0x12/0x2f
[<ffffffff810b8696>] write_cache_pages+0x236/0x3b3
[<ffffffff810b804a>] ? __writepage+0x0/0x2f
[<ffffffff810b8832>] generic_writepages+0x1f/0x21
[<ffffffff810b8863>] do_writepages+0x2f/0x37
[<ffffffff810f403a>] __writeback_single_inode+0x1a1/0x3b9
[<ffffffff81052516>] ? __dequeue_entity+0x2e/0x33
[<ffffffff810f468a>] generic_sync_sb_inodes+0x2a7/0x438
> What I don't get is why setting CPageWrite prevents
> afs_linux_writepage_sync from being called (?), as CPageWrite is checked
> inside it, and only after the afs_Trace4(). Iupdatepage with code 99999
> should therefore even show up with working antirecursion, as far as I
> can understand it.
You probably didn't wait long enough for the other Iupdatepage to show
up. The unmap() doesn't cause a flush to happen immediately - the dirty
pages eventually get written by pdflush, but that can be several seconds
later. Without the anti-recursion code, close() causes
osi_VM_StoreAllSegments to write out the mmaped modified pages right away.
Marc