[OpenAFS] 1.4.0 Solaris 10 sparc client hang

Christopher D. Clausen cclausen@acm.org
Mon, 7 Nov 2005 10:33:54 -0600


The AFS client has hung on one of my AFS servers (E3000 running Solaris 
10.)  It has the 1.4.0 binaries from the openafs.orgr website installed. 
The client hung on a cp operation from afs to the local disk.

rxdebug returns:
C:\>rxdebug afs2 7001
Trying 128.174.251.9 (port 7001):
Free packets: 130, packet reclaims: 5, calls: 422, used FDs: 64
not waiting for packets.
0 calls waiting for a thread
0 threads are idle
67108864 calls have waited for a thread
Connection from host 128.174.251.9, port 7000, Cuid 90cc8e26/da49060
  serial 19,  natMTU 1444, security index 0, server conn
    call 0: # 9, state active, mode: error
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Done.

I assume that "mode: error" is indicative of bad things.  It has been in 
this state for quite some time (several hours.)  Attempts to umount /afs 
have been unsuccessful (they hang as well.)

I am vos moving volumes off of the server (the server processes seem 
unaffected) to eventually reboot it.  My question is, what would provide 
the most information to further debug this?  Just panic the system from 
firmware and create a dump?  Or should I attempt to attach a debugger 
and see where the process is stuck?  Any advice / suggestions / pointers 
to idiots guides on debugging Solaris would be appreciated.

<<CDC
-- 
Christopher D. Clausen
ACM@UIUC SysAdmin