[OpenAFS] Re: deadlock in OpenAFS 1.4.11 (Solaris 5.10)

John Tang Boyland boyland@cs.uwm.edu
Sat, 10 Apr 2010 12:54:30 -0500

[BTW: I get openafs-info messages digested.]

Andrew Deason wrote:
] On Fri, 9 Apr 2010 14:48:34 -0400
] Derrick Brashear <shadow@gmail.com> wrote:
] > > What about the kernel stack trace for the proc(s) from mdb? Or do you
] > > know where we're hanging?
] > 
] > nope. i figured fstrace would make it easier to guess that but jumping
] > directly to a stack trace is probabyl a fine course of action.
] John, if you want to do this, do the following for each PID listed in
] that cmdebug output:
] ("$pid", "ffffffffaddress1", and "ffffffffaddress2" etc are placeholders)
] # mdb -k
] > 0t$pid::pid2proc | ::threadlist
]             ADDR             PROC              LWP CMD/LWPID
] ffffffffaddress1 ffffffffaddress2                0 XXX/YYY
] > ffffffffaddress2::findstack
] So, as an example, looking at process 674:
] > 0t674::pid2proc | ::threadlist
]             ADDR             PROC              LWP CMD/LWPID
] ffffffff83e74908 ffffffff832de020                0    /239
] > ffffffff832de020::findstack
] [stack trace]

Thanks for the detailed instructions.  I never even knew mdb existed.

process 17679 is the one writing the LONG file that seemed to 
initiate the deadlock.  I notice it is inside "FetchWholeEnchilada".

process 18421 is the one listed for the cmdebug entry for /afs:
the root directory of the whole AFS system and the cm entry with the
most waiters.

> 0t17732::pid2proc | ::threadlist
            ADDR             PROC              LWP CMD/LWPID
fffffe84baca3a98 fffffe8925fa2500                0 /239
> fffffe8925fa2500::findstack
stack pointer for thread fffffe8925fa2500: fffffe8002fce5a0
[ fffffe8002fce5a0 _resume_from_idle+0xf8() ]
  fffffe8002fce5d0 swtch+0x110()
  fffffe8002fce5f0 cv_wait+0x68()
  fffffe8002fce640 afs_osi_Sleep+0x99()
  fffffe8002fce6c0 Afs_Lock_Obtain+0x1cb()
  fffffe8002fce780 afs_putpage+0x14a()
  fffffe8002fce7f0 osi_VM_GetDownD+0xe8()
  fffffe8002fce9c0 afs_GetDownD+0x7ed()
  fffffe8002fceb90 afs_GetDCache+0x713()
  fffffe8002fcecc0 afs_nfsrdwr+0xd19()
  fffffe8002fced30 afs_vmread+0x89()
  fffffe8002fced80 fop_read+0x31()
  fffffe8002fceeb0 read+0x188()
  fffffe8002fceec0 read32+0xe()
  fffffe8002fcef10 sys_syscall32+0x101()
> 0t17679::pid2proc | ::threadlist
            ADDR             PROC              LWP CMD/LWPID
fffffe84c16f55a8 fffffe84bae85500                0 /239
> fffffe84bae85500::findstack
stack pointer for thread fffffe84bae85500: fffffe8003244640
[ fffffe8003244640 _resume_from_idle+0xf8() ]
  fffffe8003244670 swtch+0x110()
  fffffe8003244690 cv_wait+0x68()
  fffffe80032446e0 afs_osi_Sleep+0x99()
  fffffe8003244760 Afs_Lock_Obtain+0x1cb()
  fffffe8003244820 afs_putpage+0x14a()
  fffffe8003244890 osi_VM_GetDownD+0xe8()
  fffffe8003244a60 afs_GetDownD+0x7ed()
  fffffe8003244c30 afs_GetDCache+0x713()
  fffffe8003244cb0 FetchWholeEnchilada+0xf4()
  fffffe8003244d80 afs_remove+0x7eb()
  fffffe8003244de0 gafs_remove+0x4f()
  fffffe8003244e10 fop_remove+0x25()
  fffffe8003244ea0 vn_removeat+0x228()
  fffffe8003244eb0 vn_remove+0x12()
  fffffe8003244ec0 unlink+0xd()
  fffffe8003244f10 sys_syscall32+0x101()
> 0t17889::pid2proc | ::threadlist 
            ADDR             PROC              LWP CMD/LWPID
fffffe84b1c218d0 fffffe8926218840                0 /239
> fffffe8926218840::findstack
stack pointer for thread fffffe8926218840: fffffe8001cff5a0
[ fffffe8001cff5a0 _resume_from_idle+0xf8() ]
  fffffe8001cff5d0 swtch+0x110()
  fffffe8001cff5f0 cv_wait+0x68()
  fffffe8001cff640 afs_osi_Sleep+0x99()
  fffffe8001cff6c0 Afs_Lock_Obtain+0x1cb()
  fffffe8001cff780 afs_putpage+0x14a()
  fffffe8001cff7f0 osi_VM_GetDownD+0xe8()
  fffffe8001cff9c0 afs_GetDownD+0x7ed()
  fffffe8001cffb90 afs_GetDCache+0x713()
  fffffe8001cffcc0 afs_nfsrdwr+0xd19()
  fffffe8001cffd30 afs_vmread+0x89()
  fffffe8001cffd80 fop_read+0x31()
  fffffe8001cffeb0 read+0x188()
  fffffe8001cffec0 read32+0xe()
  fffffe8001cfff10 sys_syscall32+0x101()
> 0t18421::pid2proc | ::threadlist
            ADDR             PROC              LWP CMD/LWPID
fffffe84b6b60de0 fffffe8905b2d280                0 /239
> fffffe8905b2d280::findstack
stack pointer for thread fffffe8905b2d280: fffffe8000365300
[ fffffe8000365300 _resume_from_idle+0xf8() ]
  fffffe8000365330 swtch+0x110()
  fffffe8000365350 cv_wait+0x68()
  fffffe80003653a0 afs_osi_Sleep+0x99()
  fffffe8000365420 Afs_Lock_Obtain+0x1cb()
  fffffe80003654e0 afs_putpage+0x14a()
  fffffe8000365550 osi_VM_GetDownD+0xe8()
  fffffe8000365720 afs_GetDownD+0x7ed()
  fffffe80003658f0 afs_GetDCache+0x6f8()
  fffffe8000365a20 afs_lookup+0x700()
  fffffe8000365aa0 gafs_lookup+0x70()
  fffffe8000365af0 fop_lookup+0x35()
  fffffe8000365cc0 lookuppnvp+0x1bf()
  fffffe8000365d30 lookuppnat+0xf9()
  fffffe8000365df0 lookupnameat+0x86()
  fffffe8000365e50 cstatat_getvp+0x115()
  fffffe8000365eb0 cstatat64_32+0x4c()
  fffffe8000365ec0 stat64_32+0x22()
  fffffe8000365f10 sys_syscall32+0x101()
> 0t17834::pid2proc | ::threadlist
            ADDR             PROC              LWP CMD/LWPID
fffffe84c2cbc008 fffffe8516ab2720                0 ?????????K?/239
> fffffe8516ab2720::findstack
stack pointer for thread fffffe8516ab2720: fffffe8002fda5a0
[ fffffe8002fda5a0 _resume_from_idle+0xf8() ]
  fffffe8002fda5d0 swtch+0x110()
  fffffe8002fda5f0 cv_wait+0x68()
  fffffe8002fda640 afs_osi_Sleep+0x99()
  fffffe8002fda6c0 Afs_Lock_Obtain+0x1cb()
  fffffe8002fda780 afs_putpage+0x14a()
  fffffe8002fda7f0 osi_VM_GetDownD+0xe8()
  fffffe8002fda9c0 afs_GetDownD+0x7ed()
  fffffe8002fdab90 afs_GetDCache+0x713()
  fffffe8002fdacc0 afs_nfsrdwr+0xd19()
  fffffe8002fdad30 afs_vmread+0x89()
  fffffe8002fdad80 fop_read+0x31()
  fffffe8002fdaeb0 read+0x188()
  fffffe8002fdaec0 read32+0xe()
  fffffe8002fdaf10 sys_syscall32+0x101()