[OpenAFS-devel] Re: 1.2.10 linux kernel hang during AFS backups

Derrick J Brashear shadow@dementia.org
Mon, 30 Aug 2004 20:49:17 -0400 (EDT)


On Sun, 29 Aug 2004, Joe Buehler wrote:

> Derrick J Brashear wrote:
>
>> Worse than that, I'll guess your kernel modules weren't compiled with 
>> --enable-debug-kernel and so the frame pointers aren't preserved.
>
> The lines in the output I posted were truncated because tcpdump
> only saved what you saw, if that's what you mean.  I'll have to
> make some changes to get a better dump, perhaps including
> compiling my own kernel and setting up a serial console (the netconsole
> code blasts the console messages out so fast that tcpdump
> cannot keep up).  Currently the machines run Redhat kernel binaries
> and AFS kernel binaries obtained from the OpenAFS site.

no, not the lines of output. the kernel backtrace is inaccurate unless 
your module is compiled without -fomit-frame-pointer.

> I spent a few hours going through the code.  The SMP locking in AFS
> does not appear to me to be fine-grained, so one thought I had was
> to protect Afs_Lock_Obtain with a kernel mutex and see what happens.
> It may be that there is a client code path that is violating SMP
> locking and causing a deadlock in Afs_Lock_Obtain.

you could try disabling afs_TryFlushDcacheChildren in afs_vcache.c; that's 
about the only thing i can think of offhand, unless maybe you need a 
backport of glock-kernel-lock-ordering-20040714 (but i don't remember if 
that's applicable to 1.2.x)