[OpenAFS] Help with wedged Solaris box

Derrick Brashear shadow@gmail.com
Thu, 29 Nov 2012 09:16:56 -0500


On Thu, Nov 29, 2012 at 9:08 AM, Kevin Hildebrand <kevin@umd.edu> wrote:
>
> Hi, we've got a Solaris 10 box running OpenAFS-1.6.1 that is periodically
> becoming non-responsive and requiring a hard restart.
>
> From what I can see from looking at crash dumps, it appears that the threads
> that are hanging are in the process of doing file access (stat/lookup) in
> AFS.
>
> For example:
>>
>> 2a1043be951::stack

two choices of this mutex are the AFS_GLOCK or the vnode mutex for the
afs root vnode.

try the ::findlocks macro and see what's holding it?

>
> mutex_vector_enter+0x428(190d458, 2, 707dcf50, fffb14c0ca47a08a,
> 2a100297c81, 0)
> afs_root+0x3c(6003f54ce40, 2a1043bf548, 1, 0, 6002542e840, 0)
> fsop_root+0x10(6003f54ce40, 2a1043bf548, 6002153acc8, 2420, 0, 7afc1490)
> traverse+0x7c(2a1043bf678, 2a1043bf548, 0, 0, 6002542e840, 6002153acc8)
> lookuppnvp+0x3d0(2a1043bf940, 0, 6002542e840, 2a1043bf678, 2a1043bf680,
> 60021529a40)
> lookuppnat+0x120(60021529a40, 0, 1, 0, 2a1043bfad8, 0)
> lookupnameat+0x5c(0, 0, 1, 0, 2a1043bfad8, 0)
> cstatat_getvp+0x198(ffd19400, 100ae7708, 1, 1, 2a1043bfad8, 0)
> cstatat+0x40(ffffffffffd19553, 100ae7708, 1000, 100405a50, 0, 10)
> syscall_trap+0xac(100ae7708, 100405a50, 100b52fe8, 16, 100b7bffc, 4)
>>
>>
>
> For this particular crash dump, I have hundreds of threads that are stuck in
> this location.
>
> I'd appreciate any suggestions on how to debug this further.
>
> Thanks,
> Kevin
>
> --
> Kevin Hildebrand
> University of Maryland, College Park
> Division of IT
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>



-- 
Derrick