[OpenAFS-devel] Re: 1.2.10 linux kernel hang during AFS backups
Derrick J Brashear
shadow@dementia.org
Mon, 30 Aug 2004 20:49:17 -0400 (EDT)
On Sun, 29 Aug 2004, Joe Buehler wrote:
> Derrick J Brashear wrote:
>
>> Worse than that, I'll guess your kernel modules weren't compiled with
>> --enable-debug-kernel and so the frame pointers aren't preserved.
>
> The lines in the output I posted were truncated because tcpdump
> only saved what you saw, if that's what you mean. I'll have to
> make some changes to get a better dump, perhaps including
> compiling my own kernel and setting up a serial console (the netconsole
> code blasts the console messages out so fast that tcpdump
> cannot keep up). Currently the machines run Redhat kernel binaries
> and AFS kernel binaries obtained from the OpenAFS site.
no, not the lines of output. the kernel backtrace is inaccurate unless
your module is compiled without -fomit-frame-pointer.
> I spent a few hours going through the code. The SMP locking in AFS
> does not appear to me to be fine-grained, so one thought I had was
> to protect Afs_Lock_Obtain with a kernel mutex and see what happens.
> It may be that there is a client code path that is violating SMP
> locking and causing a deadlock in Afs_Lock_Obtain.
you could try disabling afs_TryFlushDcacheChildren in afs_vcache.c; that's
about the only thing i can think of offhand, unless maybe you need a
backport of glock-kernel-lock-ordering-20040714 (but i don't remember if
that's applicable to 1.2.x)