[OpenAFS-devel] Any tips for tracking down causes of hangs?
Nathan Neulinger
nneul@umr.edu
Fri, 22 Mar 2002 06:42:03 -0600
Derrick J Brashear wrote:
>
> On Thu, 21 Mar 2002, Nickolai Zeldovich wrote:
>
> > On Solaris, this is really easy: you find the struct proc for the
> > hung process, and look at the stack trace starting at tlist->sp. :)
> > Probably kgdb lets you do similar things on Linux. You might try
> > using cmdebug remotely, if the in-kernel Rx server is still working;
> > this sounds like a deadlock of some sort, which should show up in
> > cmdebug output in some way. You could also enable lock tracing, in
> > which case your fstrace output should tell you where the deadlock
> > occurred (assuming you can usefully run fstrace).
>
> I never got lock tracing (via fstrace) to work on Linux, just fyi. I ended
> up using printks to do it instead.
Interesting... I got the following with a cmdebug after one of these
hangs:
troot-srvtst02(12)> cmdebug localhost
** Cache entry @ 0xd0e43e40 for 1.536908342.1.1
locks: (none_waiting, write_locked(pid:7228 at:54))
0 bytes DV 0 refcnt 1
callback cbb54b60 expires 0
0 opens 0 writers
volume root
states (0x0)
troot-srvtst02(13)> vos e 536908342
troot-srvtst02(14)> ps -auxwww | grep 7228
gpweb 7228 0.0 0.6 5764 1592 ? S 03:59 0:00 ./httpd
-d /afs/u
mr.edu/software/gpweb/apache-root-websrv1
root 10707 0.0 0.2 1680 612 pts/0 S 06:29 0:00 grep 7228
And get this... I can't vos e that volume from anywhere.... even other
clients.
I've got a feeling this might not have anything to do with that client,
and may instead being something wrong with the server that volume it on,
cause problems keep cropping up with specific volumes on that server.
-- Nathan
------------------------------------------------------------
Nathan Neulinger EMail: nneul@umr.edu
University of Missouri - Rolla Phone: (573) 341-4841
Computing Services Fax: (573) 341-4216