[OpenAFS-devel] Cache inconsistency in client 1.4.8 and above

Marc Dionne marc.c.dionne@gmail.com
Thu, 16 Apr 2009 20:40:14 -0400


On 04/16/2009 08:25 AM, Felix Frank wrote:
>>> -    if (!avc->states & CPageWrite)

I see a bug there - this line probably wants to be:
     if (!(avc->states & CPageWrite))

So the recursion was avoided by never actually doing anything in 
StoreAllSegments, since CPageWrite never got set and the condition was 
always false.

With the fix above, my larger mmap test quickly runs into a deadlock 
again.  Looks like cache_write_pages is trying to lock the page that is 
currently being written:

(this is pdflush):
[<ffffffffa0b91d14>] ? crfree+0x38/0x3c [libafs]
[<ffffffff81077f85>] ? getnstimeofday+0x5a/0xae
[<ffffffff810b2b0a>] ? sync_page+0x0/0x45
[<ffffffff8144c905>] schedule+0x9/0x1d
[<ffffffff8144c94c>] io_schedule+0x33/0x44
[<ffffffff810b2b4b>] sync_page+0x41/0x45
[<ffffffff8144cd0e>] __wait_on_bit_lock+0x41/0x8a
[<ffffffff810b2acf>] __lock_page+0x61/0x68
[<ffffffff8107144d>] ? wake_bit_function+0x0/0x2e
[<ffffffff810b863c>] write_cache_pages+0x1dc/0x3b3
[<ffffffff810b804a>] ? __writepage+0x0/0x2f
[<ffffffff810b8832>] generic_writepages+0x1f/0x21
[<ffffffff810b8863>] do_writepages+0x2f/0x37
[<ffffffff810b35e3>] __filemap_fdatawrite_range+0x4b/0x4d
[<ffffffff810b3d90>] filemap_fdatawrite+0x1a/0x1c
[<ffffffffa0b9485c>] osi_VM_StoreAllSegments+0xd7/0x17c [libafs]
[<ffffffffa0b5e000>] afs_StoreAllSegments+0xcb/0x17c7 [libafs]
[<ffffffff810dbc69>] ? __fput+0x17b/0x18a
[<ffffffff81077f85>] ? getnstimeofday+0x5a/0xae
[<ffffffff81077fee>] ? do_gettimeofday+0x15/0x38
[<ffffffffa0b99fdf>] ? afs_icl_Event4+0xfe/0x162 [libafs]
[<ffffffffa0b751ba>] afs_DoPartialWrite+0x55/0x5a [libafs]
[<ffffffffa0b97655>] afs_linux_writepage_sync+0x30f/0x3fc [libafs]
[<ffffffff8122156b>] ? prio_tree_next+0x1c3/0x224
[<ffffffffa0b97838>] afs_linux_writepage+0x8c/0xba [libafs]
[<ffffffff810b805c>] __writepage+0x12/0x2f
[<ffffffff810b8696>] write_cache_pages+0x236/0x3b3
[<ffffffff810b804a>] ? __writepage+0x0/0x2f
[<ffffffff810b8832>] generic_writepages+0x1f/0x21
[<ffffffff810b8863>] do_writepages+0x2f/0x37
[<ffffffff810f403a>] __writeback_single_inode+0x1a1/0x3b9
[<ffffffff81052516>] ? __dequeue_entity+0x2e/0x33
[<ffffffff810f468a>] generic_sync_sb_inodes+0x2a7/0x438

 > What I don't get is why setting CPageWrite prevents
 > afs_linux_writepage_sync from being called (?), as CPageWrite is checked
 > inside it, and only after the afs_Trace4(). Iupdatepage with code 99999
 > should therefore even show up with working antirecursion, as far as I
 > can understand it.

You probably didn't wait long enough for the other Iupdatepage to show 
up.  The unmap() doesn't cause a flush to happen immediately - the dirty 
pages eventually get written by pdflush, but that can be several seconds 
later.  Without the anti-recursion code, close() causes 
osi_VM_StoreAllSegments to write out the mmaped modified pages right away.

Marc