[OpenAFS-devel] Any tips for tracking down causes of hangs?

Nathan Neulinger nneul@umr.edu
Thu, 21 Mar 2002 20:13:52 -0600


I've got certain linux 2.4.x (mostly 10 and 18) machines with processes
that seem to hang in D state at random times with both older (-current
around the time .10 was released, and .18 with current cvs right now).
When this happens, it's generally a permanent state, and inherited by
all future attempts to access afs.

When I've traced them (which brings up something I need to dig into,
fstrace seems to have a massive memory leak in it) I haven't been able
to pick out anything useful. It appears that in most cases, accesses
continue. 

Is there any straightforward way to see what a particular process is
hung against as far as afsd is concerned?

I don't have a good feel for what causes this, other than I've noticed
that on certain of the machines, doing a cvs diff against a large
checkout in afs, and the other one (though much rarer) is doing web
service. Except in rare cases, there are no messages or panics of any
kind, just completely hung access to afs. I'm running with -stat 10000
-dcache 4000 -daemons 5 -volumes 256, and anywhere from a 250MB - 2GB
cache. 

Are there any tips for tracking down the cause of this? I'm going to
look into kgdb soon, but haven't done anything with it yet. 

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-4841
Computing Services                       Fax: (573) 341-4216