[OpenAFS] AFS mount hangs when internet connection is lost?

Ryan C. Underwood nemesis-lists@icequake.net
Thu, 12 Aug 2010 15:54:15 -0500


I have a system which acts as a NAT router (Ethernet) to share a CDMA
modem (USB).  The same system runs the AFS client which talks to AFS
fileservers over the internet.

Occasionally the modem is knocked offline, and when this happens the
Linux USB driver resets the modem.  Whenever the modem is knocked
offline temporarily even once, the /afs mount and all processes that
were accessing it at the time that it was disconnected permanently hangs
until the system is rebooted.

The kernel logs show hung_task messages always similar to the following,
always hanging in afs_PutVCache on each process accessing AFS at the
time:

[ 4440.472856] INFO: task perl:21072 blocked for more than 120 seconds.
[ 4440.472861] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4440.472866] perl          D ffff88008dc8d0b8     0 21072  21071 0x00000000
[ 4440.472877]  ffff8800921b1b58 0000000000000086 ffff880000000000 0000000000015900
[ 4440.472887]  ffff8800921b1fd8 0000000000015900 ffff8800921b1fd8 ffff8800902196e0
[ 4440.472897]  0000000000015900 0000000000015900 ffff8800921b1fd8 0000000000015900
[ 4440.472907] Call Trace:
[ 4440.472962]  [<ffffffffa0638089>] ? afs_PutVCache+0x79/0x140 [openafs]
[ 4440.472973]  [<ffffffff8158730f>] __mutex_lock_slowpath+0xff/0x190
[ 4440.472982]  [<ffffffff815871eb>] mutex_lock+0x2b/0x50
[ 4440.472991]  [<ffffffff8115d7b7>] do_lookup+0x107/0x280
[ 4440.473000]  [<ffffffff8115e1de>] link_path_walk+0x12e/0xab0
[ 4440.473009]  [<ffffffff8115e613>] link_path_walk+0x563/0xab0
[ 4440.473016]  [<ffffffff8115ecc7>] path_walk+0x67/0xe0
[ 4440.473023]  [<ffffffff8115ee9b>] do_path_lookup+0x5b/0xa0
[ 4440.473031]  [<ffffffff8115fb67>] user_path_at+0x57/0xa0
[ 4440.473039]  [<ffffffff81155c4c>] vfs_fstatat+0x3c/0x80
[ 4440.473047]  [<ffffffff81155d6b>] vfs_stat+0x1b/0x20
[ 4440.473054]  [<ffffffff81155d94>] sys_newstat+0x24/0x50
[ 4440.473063]  [<ffffffff8158c46e>] ? do_page_fault+0x15e/0x350
[ 4440.473071]  [<ffffffff81588fb5>] ? page_fault+0x25/0x30
[ 4440.473080]  [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b

Kernel is 2.6.35-13 and OpenAFS is 1.5.75 from the ubuntu repository.

I don't know if it helps, but here is the output of cmdebug -long some
time after the hang:

$ cmdebug localhost -long
Lock afs_xvcache status: (none_waiting)
Lock afs_xdcache status: (none_waiting)
Lock afs_xserver status: (none_waiting)
Lock afs_xvcb status: (none_waiting)
Lock afs_xbrs status: (none_waiting)
Lock afs_xcell status: (none_waiting)
Lock afs_xconn status: (none_waiting)
Lock afs_xuser status: (none_waiting)
Lock afs_xvolume status: (none_waiting)
Lock puttofile status: (none_waiting)
Lock afs_ftf status: (none_waiting)
Lock afs_xcbhash status: (none_waiting)
Lock afs_xaxs status: (none_waiting)
Lock afs_xinterface status: (none_waiting)
Lock afs_xosi status: (none_waiting)
Lock afs_xsrvAddr status: (none_waiting)
Lock afs_xvreclaim status: (none_waiting)
Lock afsdb_client_loc status: (none_waiting)
Lock afsdb_req_lock status: (none_waiting)
Lock afs_discon_lock status: (none_waiting, 1 read_locks(pid:0))
Lock afs_disconDirtyL status: (none_waiting)
Lock afs_discon_vc_di status: (none_waiting)
Lock dynroot status: (none_waiting)
Lock icequake.net status: (none_waiting)
** Cache entry @ 0x8dc8c000 for 0.1.1.1 [dynroot]
            2048 bytes  DV            3  refcnt     3
    callback 00000000   expires 0
    0 opens     0 writers
    volume root   
    states (0x5), stat'd, read-only
** Cache entry @ 0x8dc8d400 for 2.536870916.1.1 [icequake.net]
    locks: (writer_waiting, write_locked(pid:18986 at:54), 1 waiters)
            2048 bytes  DV          822  refcnt     3
    callback 23b9c280   expires 1281650636
    1 opens     0 writers
    volume root   
    states (0x4), read-only
** Cache entry @ 0x8dc8d000 for 0.1.16777220.1 [dynroot]
              23 bytes  DV            1  refcnt     2
    callback 00000000   expires 0
    0 opens     0 writers
    mount point   
    states (0xd), stat'd, read-only, mt pt valid


-- 
Ryan C. Underwood, <nemesis@icequake.net>