[OpenAFS-devel] Re: 1.2.10 linux kernel hang during AFS backups

Joe Buehler aspam@cox.net
Sun, 29 Aug 2004 19:45:09 -0400


Derrick J Brashear wrote:

> Worse than that, I'll guess your kernel modules weren't compiled with 
> --enable-debug-kernel and so the frame pointers aren't preserved.

The lines in the output I posted were truncated because tcpdump
only saved what you saw, if that's what you mean.  I'll have to
make some changes to get a better dump, perhaps including
compiling my own kernel and setting up a serial console (the netconsole
code blasts the console messages out so fast that tcpdump
cannot keep up).  Currently the machines run Redhat kernel binaries
and AFS kernel binaries obtained from the OpenAFS site.

Another point of information: I booted the machines with a uniprocessor
kernel and everything is now running fine.

I spent a few hours going through the code.  The SMP locking in AFS
does not appear to me to be fine-grained, so one thought I had was
to protect Afs_Lock_Obtain with a kernel mutex and see what happens.
It may be that there is a client code path that is violating SMP
locking and causing a deadlock in Afs_Lock_Obtain.

Joe Buehler