[OpenAFS] Debugging Linux AFS client when client hangs
Rainer Toebbicke
rtb@pclella.cern.ch
Fri, 30 Oct 2009 09:54:22 +0100
John Perkins schrieb:
> We're dealing with an interesting situation at our site recently: after
> rolling out RHEL 5 update 4
> our department's Linux computers, we're finding certain applications
> seem to cause AFS to
> no longer respond when fetching contents of specific directories in
> AFS. Access to local
> filesystems in this state appears to work just fine.
>
> Simon was kind enough to provide useful instructions at
> http://blob.inf.ed.ac.uk/sxw/2009/01/24/using-fstrace-to-debug-the-afs-cache-manager/
>
> back in January...unfortunately, the fstrace process gets stuck in
> device wait and will not
> return any useful information.
> If I could only get some useful debugging information, I would gladly
> submit it to RT...
>
On "hard" AFS lock-ups, if you have a crash dump and the necessary
kernel-debug packages then displaying afs_global_owner using crash will tell
you who's holding the global lock at the moment the dump was taken, and a
stack trace of the fstrace process may hint at what it is waiting for.
For problems on "certain directories" or even "certain files", 'cmdebug
localhost' is usually the best bet: it'll describe which volume the cache
manager is busy with, which process is involved and where in the code this
occurs which often gives a hint to the cause.
BTW: we're running RHEL 5.4 and derivatives with 1.4.8 on hundreds if not
already thousands of machines without anything notably problematic.
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985 Fax: +41 22 767 7155