[OpenAFS-devel] possible recursive locking detected

Chas Williams (CONTRACTOR) chas@cmf.nrl.navy.mil
Tue, 15 Jul 2008 10:52:07 -0400


In message <20080714234544.GB16373@excalibur.hozed.org>,Troy Benjegerdes writes:
>What does anyone make of this? 
>Kernel 2.6.26-rc8, openafs-cvs HEAD
>
>This particular machine used to deadlock userspace without the lock
>debugging enabled. It appears to be okay now.

i would guess that is it probably right.  there might be a recursive
lock somewhere.  it looks like the rx layer with the afs_mutex_enter()
pointing to one of the rx locks.  this probably highlights the need
for the afs_mutex_enter() to be static inline so that the debugging 
doesnt point to afs_mutex_enter().

the linux rx mutex layer has code to try to catch this so i am little
puzzled that you dont see an osi_Panic() after this.

the debug looks a bit garbled.  i tried to correct for this below.
it would be nice to find out which source code line corresponds to
.rxi_ReapConnections+0x1b0.  that would tell which lock is the problem.

>[   37.850406] =============================================
>[   37.947534] [ INFO: possible recursive locking detected ]
>[   38.012120] 2.6.26-rc8 #10
>[   38.044465] ---------------------------------------------
>[   38.109049] afsd/3684 is trying to acquire lock:
>[   38.164275]  (&l->mutex){--..}, at: [<d00000000014f0f4>]       .afs_mutex_enter+0x24/0x70 [libafs]
>[   38.282318]
>[   38.282319] but task is already holding lock:
>[   38.352206]  (&l->mutex){--..}, at: [<d00000000014f0f4>]       .afs_mutex_enter+0x24/0x70 [libafs]
>[   38.470250]
>[   38.470251] other info that might help us debug this:
>[   38.548459] 2 locks held by afsd/3684:
>[   38.593282]  #0:  (afs_global_lock){--..}, at: [<d0000000001583fc>]    .osi_linux_alloc+0x17c/0x4c0 [libafs]
>[   38.724950]  #1:  (&l->mutex){--..}, at: [<d00000000014f0f4>] .afs_mutex_enter+0x24/0x70 [libafs]
>[   38.848298]
>[   38.848300] stack backtrace:
>[   38.900507] Call Trace:
>[   38.929730] [c0000007f95330f0] [c000000000011144] .show_stack+0x64/0x210ateway: 10.1.0.2 (unreliable)
>[   39.040389] [c0000007f95331b0] [c000000000011310] .dump_stack+0x20/0x40
>[   39.119741] [c0000007f9533230] [c000000000098c90] .__lock_acquire+0xe20/0x1270
>[   39.206376] [c0000007f9533330] [c0000000000991b4] .lock_acquire+0xd4/0x120
>[   39.288851] [c0000007f95333f0] [c00000000038ce3c] .mutex_lock_nested+0xfc/0x420
>[   39.376522] [c0000007f95334f0] [d00000000014f0f4] .afs_mutex_enter+0x24/0x70 [libafs]
>[   39.487078] [c0000007f9533570] [d0000000001462b0] .rxi_ReapConnections+0x1b0/0x4e0 [libafs]
>[   39.603871] [c0000007f9533690] [d00000000014acec] .rx_StartServer+0xac/0xf0 [libafs]
>[   39.713388] [c0000007f9533740] [d000000000173c58] .afs_ResourceInit+0x1b8/0x1f0 [libafs]
>[   39.827059] [c0000007f95337d0] [d0000000001b8024] .afs_DaemonOp+0x2f4/0x310 [libafs]
>[   39.936573] [c0000007f9533970] [d0000000001b8b00] .afs_syscall_call+0x270/0x1ce0 [libafs]
>[   40.051290] [c0000007f9533aa0] [d00000000013ebec] .afs_syscall+0x14c/0x6a0 [libafs]
>[   40.159763] [c0000007f9533bc0] [d00000000015b248] .afs_unlocked_ioctl+0xc8/0x110 [libafs]
>[   40.274476] [c0000007f9533c80] [c000000000160a40] .proc_reg_compat_ioctl+0xb0/0x100 > roo
>[   40.382952] [c0000007f9533d30] [c00000000014a5d0] .compat_sys_ioctl+0xe0/0x500
>[   40.469585] [c0000007f9533e30] [c0000000000086d4] syscall_exit+0x0/0x40