[OpenAFS] Linux kernel panic, OpenAFS client, gconf

Jan-Marc Pilawa j.pilawa@tu-bs.de
Thu, 17 Jun 2004 15:41:14 +0200

Hello *, 

I read some threads about this problem in the archives, but till now I have no 
clue how to solve the frequent client crashes on SMP-systems. The problem is 
always triggered by gconfd-2 (At least the problem was only one time 
triggered by another application (mozilla)). 

I upgraded from openafs-1.2.10 to 1.2.11 (on SuSE 9.0, Kernel 2.4.21-xxx) and 
applied a patch from Chas Williams for osi_vnodeops.c, but it is almost the 
same. The Situation is improved sofar that in some cases afsd seems to hang 
and the applications produce very high load, because they can't access afs.

In most cases the systems produce oopses like the following one (here the 
output from ksymoops of the kernel panic):

TT3<1>Unable to handle kernel paging request at virtual address ffffffff
*pde = 00006063
Oops: 0002 2.4.21-226-smp4G #1 SMP Tue Jun 15 10:28:32 UTC 2004
CPU:    1
EIP:    0010:[<c6148b50>]    Tainted: P 
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010292
eax: 00000003   ebx: ea16dec8   ecx: 00000046   edx: c032d058
esi: faba5c34   edi: 00000001   ebp: fa4fe660   esp: ea16de58
ds: 0018   es: 0018   ss: 0018
Process gconfd-2 (pid: 15463, stackpage=ea16d000)
Stack: c616d490 e8298b20 ea16deec ea16decc ea16dee8 ea16def8 ea16dec0 c6129419 
       c616d490 e8298b20 ea16deec ea16decc fa6043f8 00000008 ea16dee8 00000001 
       00000001 00000000 00000000 00000000 00000000 e8298b20 00000000 00000006 
Call Trace:         [<c616d490>] (28) [<c6129419>] (04) [<c616d490>] (76)
  [<c6120e89>] (92) [<c6175664>] (12) [<c6175664>] (08) [<c6158b71>] (48)
  [<c0163742>] (32) [<c016512f>] (60) [<c0109637>] (60)
Code: c6 05 ff ff ff ff 2a 83 c4 1c c3 90 8d 74 26 00 b8 76 d9 16 

>>EIP; c6148b50 <[libafs]osi_Panic+20/60>   <=====

>>ebx; ea16dec8 <[ax25]ax25_table_size+3191a58/ae43bf0>
>>edx; c032d058 <log_wait+0/c>
>>esp; ea16de58 <[ax25]ax25_table_size+31919e8/ae43bf0>

Trace; c616d490 <[libafs].rodata.end+4fe5/cb95>
Trace; c6129419 <[libafs]afs_lookup+fb9/1250>
Trace; c616d490 <[libafs].rodata.end+4fe5/cb95>
Trace; c6120e89 <[libafs]afs_access+f9/390>
Trace; c6175664 <[libafs]afs_global_lock+0/1c>
Trace; c6175664 <[libafs]afs_global_lock+0/1c>
Trace; c6158b71 <[libafs]afs_linux_lookup+61/1c0>
Trace; c0163742 <lookup_hash+c2/120>
Trace; c016512f <sys_unlink+8f/130>
Trace; c0109637 <system_call+33/38>

Code;  c6148b50 <[libafs]osi_Panic+20/60>
00000000 <_EIP>:
Code;  c6148b50 <[libafs]osi_Panic+20/60>   <=====
   0:   c6 05 ff ff ff ff 2a      movb   $0x2a,0xffffffff   <=====
Code;  c6148b57 <[libafs]osi_Panic+27/60>
   7:   83 c4 1c                  add    $0x1c,%esp
Code;  c6148b5a <[libafs]osi_Panic+2a/60>
   a:   c3                        ret    
Code;  c6148b5b <[libafs]osi_Panic+2b/60>
   b:   90                        nop    
Code;  c6148b5c <[libafs]osi_Panic+2c/60>
   c:   8d 74 26 00               lea    0x0(%esi,1),%esi
Code;  c6148b60 <[libafs]osi_Panic+30/60>
  10:   b8 76 d9 16 00            mov    $0x16d976,%eax

The systems crash most likely around 12am, but i saw them crashing at other 
times, too. At that time many users are logged in. I can login remote or at 
the console as root and /sbin/reboot -f still works, thats fine -at least for 
me, but not for about a dozen of users ;-).

Mit freundlichen Gruessen / Sincerely

Jan Pilawa

+ Kontakt ----------------------------------------------------+
+ Systembetreuung Rechenzentrum TU Braunschweig               +
+ Hans-Sommer-Str. 65, D-38092 Braunschweig                   +
+ Tel: +49 531 391-5548 E-Mail: j.pilawa@tu-bs.de ____________+