[OpenAFS-devel] Cache inconsistency in client 1.4.8 and above

Felix Frank Felix.Frank@Desy.de
Fri, 17 Apr 2009 08:13:36 +0200 (CEST)


On Thu, 16 Apr 2009, Marc Dionne wrote:

> On 04/16/2009 08:25 AM, Felix Frank wrote:
>>>> -    if (!avc->states & CPageWrite)
>
> I see a bug there - this line probably wants to be:
>    if (!(avc->states & CPageWrite))
>
> So the recursion was avoided by never actually doing anything in 
> StoreAllSegments, since CPageWrite never got set and the condition was always 
> false.

I guess this explains why mmap was severely broken since 1.4.8

> With the fix above, my larger mmap test quickly runs into a deadlock again. 
> Looks like cache_write_pages is trying to lock the page that is currently 
> being written:

I think I just reproduced :/

> (this is pdflush):
> [<ffffffffa0b91d14>] ? crfree+0x38/0x3c [libafs]
> [<ffffffff81077f85>] ? getnstimeofday+0x5a/0xae
> [<ffffffff810b2b0a>] ? sync_page+0x0/0x45
> [<ffffffff8144c905>] schedule+0x9/0x1d
> [<ffffffff8144c94c>] io_schedule+0x33/0x44
> [<ffffffff810b2b4b>] sync_page+0x41/0x45
> [<ffffffff8144cd0e>] __wait_on_bit_lock+0x41/0x8a
> [<ffffffff810b2acf>] __lock_page+0x61/0x68
> [<ffffffff8107144d>] ? wake_bit_function+0x0/0x2e
> [<ffffffff810b863c>] write_cache_pages+0x1dc/0x3b3
> [<ffffffff810b804a>] ? __writepage+0x0/0x2f
> [<ffffffff810b8832>] generic_writepages+0x1f/0x21
> [<ffffffff810b8863>] do_writepages+0x2f/0x37
> [<ffffffff810b35e3>] __filemap_fdatawrite_range+0x4b/0x4d
> [<ffffffff810b3d90>] filemap_fdatawrite+0x1a/0x1c
> [<ffffffffa0b9485c>] osi_VM_StoreAllSegments+0xd7/0x17c [libafs]
> [<ffffffffa0b5e000>] afs_StoreAllSegments+0xcb/0x17c7 [libafs]
> [<ffffffff810dbc69>] ? __fput+0x17b/0x18a
> [<ffffffff81077f85>] ? getnstimeofday+0x5a/0xae
> [<ffffffff81077fee>] ? do_gettimeofday+0x15/0x38
> [<ffffffffa0b99fdf>] ? afs_icl_Event4+0xfe/0x162 [libafs]
> [<ffffffffa0b751ba>] afs_DoPartialWrite+0x55/0x5a [libafs]
> [<ffffffffa0b97655>] afs_linux_writepage_sync+0x30f/0x3fc [libafs]
> [<ffffffff8122156b>] ? prio_tree_next+0x1c3/0x224
> [<ffffffffa0b97838>] afs_linux_writepage+0x8c/0xba [libafs]
> [<ffffffff810b805c>] __writepage+0x12/0x2f
> [<ffffffff810b8696>] write_cache_pages+0x236/0x3b3
> [<ffffffff810b804a>] ? __writepage+0x0/0x2f
> [<ffffffff810b8832>] generic_writepages+0x1f/0x21
> [<ffffffff810b8863>] do_writepages+0x2f/0x37
> [<ffffffff810f403a>] __writeback_single_inode+0x1a1/0x3b9
> [<ffffffff81052516>] ? __dequeue_entity+0x2e/0x33
> [<ffffffff810f468a>] generic_sync_sb_inodes+0x2a7/0x438
>
>> What I don't get is why setting CPageWrite prevents
>> afs_linux_writepage_sync from being called (?), as CPageWrite is checked
>> inside it, and only after the afs_Trace4(). Iupdatepage with code 99999
>> should therefore even show up with working antirecursion, as far as I
>> can understand it.
>
> You probably didn't wait long enough for the other Iupdatepage to show up. 
> The unmap() doesn't cause a flush to happen immediately - the dirty pages 
> eventually get written by pdflush, but that can be several seconds later. 
> Without the anti-recursion code, close() causes osi_VM_StoreAllSegments to 
> write out the mmaped modified pages right away.

I see, thanks for clearing that up.

Guess we're back to square one then. I posted a hack to RT #124627 
yesterday that does prevent deadlock, but apparently much data won't ever 
get written to the cache and mmap_test reports corruptions (gets lots of 
0s). So what to do instead of osi_VM_StoreAllSegments() during partial 
writes?

Regards
  - Felix