[OpenAFS] Linux kernel panic, OpenAFS, gconf

Miles Davis miles@cs.stanford.edu
Mon, 26 Apr 2004 10:44:56 -0700


Hi everybody,

I've been trying to track down a kernel panic we've been experiencing for 
a couple of months now without success. I googled around and checked my 
list archives, but I can't find anything that seems to match my problem.

A user has a research job that (for reasons I cannot explain) starts/stops 
gnome sessions in quick succession to do <something>. If somebody wants 
more detail, I'll try to find out exactly what it does, but it seems 
secondary to the problem at the moment. The main thing to think about is a 
login-run stuff-logout loop.

This worked fine on a local filesystem, but will eventually (within hours) 
crash the client system when hosted in AFS. The trouble starts at some 
point with gconf, my favorite program [:)], being unable or unwilling to 
release a lock. I see this in the logs:

starting (version 2.2.0), pid 26455 user 'joe'
Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only config source at position 0
Resolved address "xml:readwrite:/afs/blah/u/joe/.gconf" to a writable config source at position 1
Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only config source at position 2 
Received signal 1, shutting down cleanly 
Failed to give up lock on XML dir "/afs/blah/u/joe/.gconf": Failed to remove lock directory `/afs/blah/u/liblit/.gconf/%gconf-xml-backend.lock': Directory not empty

Repeat that somewhere between 10-25 times.

I'm not sure why this happens, but I don't care about that part. It's the 
resultant kernel panic that bothers me:

Apr 25 19:47:59 moa kernel: TT3<1>Unable to handle kernel paging request at virtual address ffffffff
Apr 25 19:47:59 moa kernel:  printing eip:
Apr 25 19:47:59 moa kernel: f8b7e950
Apr 25 19:47:59 moa kernel: *pde = 00003067
Apr 25 19:47:59 moa kernel: *pte = 00000000
Apr 25 19:47:59 moa kernel: Oops: 0002
Apr 25 19:47:59 moa kernel: libafs-2.4.20-24.9-i686.mp ide-cd cdrom lp parport nfsd nfs lockd sunrpc autofs tg3 ipt_REJECT iptable_filter ip_tables keybdev mousedev hid input usb-ohci us
Apr 25 19:47:59 moa kernel: CPU:    0
Apr 25 19:47:59 moa kernel: EIP:    0060:[<f8b7e950>]    Tainted: PF
Apr 25 19:47:59 moa kernel: EFLAGS: 00210286
Apr 25 19:47:59 moa kernel:
Apr 25 19:47:59 moa kernel: EIP is at osi_Panic [libafs-2.4.20-24.9-i686.mp] 0x20 (2.4.20-31.9smp)
Apr 25 19:47:59 moa kernel: eax: 00000003   ebx: f8d51240   ecx: 00200002   edx: d967bdf4
Apr 25 19:47:59 moa kernel: esi: 00000000   edi: 00000401   ebp: 00000000   esp: d967be58
Apr 25 19:47:59 moa kernel: ds: 0068   es: 0068   ss: 0068
Apr 25 19:47:59 moa kernel: Process gconfd-2 (pid: 26455, stackpage=d967b000)
Apr 25 19:47:59 moa kernel: Stack: f8ba3774 cb122c80 d967beec d967becc d967bee8 d967bef8 d967bec0 f8b5ea53
Apr 25 19:47:59 moa kernel:        f8ba3774 cb122c80 d967beec d967becc 00000001 00000001 02580023 00000001
Apr 25 19:47:59 moa kernel:        00000000 00000000 00000000 00000000 f8d51240 cb122c80 00000000 00000a6c
Apr 25 19:47:59 moa kernel: Call Trace:   [<f8ba3774>] .rodata.str1.1 [libafs-2.4.20-24.9-i686.mp] 0x9c4 (0xd967be58))
Apr 25 19:47:59 moa kernel: [<f8b5ea53>] afs_lookup [libafs-2.4.20-24.9-i686.mp] 0x7b3 (0xd967be74))
Apr 25 19:47:59 moa kernel: [<f8ba3774>] .rodata.str1.1 [libafs-2.4.20-24.9-i686.mp] 0x9c4 (0xd967be78))
Apr 25 19:47:59 moa kernel: [<f8b56c19>] afs_access [libafs-2.4.20-24.9-i686.mp] 0xf9 (0xd967bec4))
Apr 25 19:47:59 moa kernel: [<f8b89834>] crget [libafs-2.4.20-24.9-i686.mp] 0x54 (0xd967bf14))
Apr 25 19:47:59 moa kernel: [<f8b8ea7e>] afs_linux_lookup [libafs-2.4.20-24.9-i686.mp] 0x5e (0xd967bf34))
Apr 25 19:47:59 moa kernel: [<c0160b72>] lookup_hash [kernel] 0xc2 (0xd967bf64))
Apr 25 19:47:59 moa kernel: [<c0162498>] sys_unlink [kernel] 0xa8 (0xd967bf84))
Apr 25 19:47:59 moa kernel: [<c01098cf>] system_call [kernel] 0x33 (0xd967bfc0))
Apr 25 19:47:59 moa kernel:
Apr 25 19:47:59 moa kernel:
Apr 25 19:47:59 moa kernel: Code: c6 05 ff ff ff ff 2a 83 c4 1c c3 90 8d 74 26 00 b8 62 3c ba

Where pid 26455 is the last instance of gconfd before the panic. The 
system is then in a nice half-dead state.

I used to think that the crash had something to do with nightly 'vos 
backupsys' since they seemed to happen at the same time, but that has 
since been disproven. I've tried a few other things to try and cause the 
crash, like moving the volume, restarting the fileserver, and nothing on 
the server side seems to be able to trigger it.

The client is RedHat 9, stock kernel 2.4.20-31.9smp. Same crash happened
under 2.4.20-24.9smp and 2.4.20-30.9smp.

OpenAFS is 1.2.11-rh9.0.1, straight from openafs.org with no 
customizations. Client options are currently "$XLARGE -nosettime -memcache 
-fakestat", though the crash also occurred using disk cache and the 
'MEDIUM' and 'SMALL' configs in /etc/sysconfig/afs.

Server is RedHat 9, 2.4.20-30.9smp, openafs-1.2.11-rh9.0.1. I don't see 
anything logged at the time of the client crash.

If anybody has any suggestions, or a plan of attack, I would be grateful. 
I can provide more information if needed.

Thanks,

-- 
// Miles Davis - miles@cs.stanford.edu - http://www.cs.stanford.edu/~miles
// Computer Science Department - Computer Facilities
// Stanford University