[OpenAFS] 1.3.74 oops on linux 2.6.9

Andrej Filipcic andrej.filipcic@ijs.si
Tue, 30 Nov 2004 11:07:57 +0100


Hi,

I have some troubles with afs client version 1.3.74 on gentoo with 2.6.9 
kernel (gcc 3.3.4). The oops (ksymoops output bellow) allways occurs when I 
try to untar a linux kernel into afs space, the same happens with disk or mem 
cache.

There is also a problem with cache management on amd64 with fedora core 3.
The client seems to work reliably wih memcache, after applying osi_GetTime 
patch for 64 bit platforms. Disk cache (ext3) behaves very strange. When 
cache usage is low, the client works. When the cache partition is almost 
full, the client starts to loose some files. fs flush helps, but the files in 
the cache are not removed or reused. Eventually, the cache partition is 
filled up with no space for cache manager. This does not happen on i386.

Cheers,
Andrej 


------------------------------

ksymoops 2.4.10 on i686 2.6.9-gentoo-r6.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.6.9-gentoo-r6/ (default)
     -m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Error (regular_file): read_ksyms stat /proc/ksyms failed
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
SGI XFS with ACLs, large block numbers, no debug enabled
C057 
cs: IO port probe 0x0c00-0x0cff: clean.
cs: IO port probe 0x0800-0x08ff: clean.
cs: IO port probe 0x0100-0x04ff: excluding 0x140-0x14f 0x378-0x37f 0x3e8-0x3ff 
0x4d0-0x4d7
cs: IO port probe 0x0a00-0x0aff: clean.
Unable to handle kernel paging request at virtual address ffffffff
e635ef51
*pde = 00002067
Oops: 0002 [#1]
CPU:    0
EIP:    0060:[<e635ef51>]    Tainted: P   VLI
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286   (2.6.9-gentoo-r6) 
eax: 00000021   ebx: e1c92350   ecx: 00000000   edx: c04a22ec
esi: e1c96a20   edi: 0000161c   ebp: 00000003   esp: d9027f40
ds: 007b   es: 007b   ss: 0068
Stack: e637f340 00000003 e635feaf 00000003 e635fdbc e637f340 00000003 e635feaf 
       00000003 00002148 00001ba4 e1c96a20 00000000 e635f7ea e1c96a20 00001ba4 
       00000003 00000000 dfc07280 00000000 00000000 019cf9c2 d9551940 e1c9bdd0 
Call Trace:
 [<e635feaf>] rxi_AllocDataBuf+0x2f/0x70 [libafs]
 [<e635fdbc>] allocCBuf+0x6c/0xd0 [libafs]
 [<e635feaf>] rxi_AllocDataBuf+0x2f/0x70 [libafs]
 [<e635f7ea>] rxk_ReadPacket+0x5a/0x140 [libafs]
 [<e635f929>] rxk_Listener+0x59/0x1a0 [libafs]
 [<e6370e93>] afsd_thread+0x393/0x3a0 [libafs]
 [<e6370b00>] afsd_thread+0x0/0x3a0 [libafs]
 [<c01052cd>] kernel_thread_helper+0x5/0x18
Code: 38 e6 8b 54 24 14 85 d2 0f 44 d0 8b 44 24 20 89 14 24 89 44 24 0c 8b 44 
24 1c 89 44 24 08 8b 44 24 18 89 44 24 04 e8 bf 68 dc d9 <c6> 05 ff ff ff ff 
2a 83 c4 10 c3 8d 74 26 00 55 b8 ff ff ff ff 


>>EIP; e635ef51 <pg0+25dacf51/3fa4c400>   <=====

>>ebx; e1c92350 <pg0+216e0350/3fa4c400>
>>edx; c04a22ec <log_wait+0/8>
>>esi; e1c96a20 <pg0+216e4a20/3fa4c400>
>>esp; d9027f40 <pg0+18a75f40/3fa4c400>

Trace; e635feaf <pg0+25dadeaf/3fa4c400>
Trace; e635fdbc <pg0+25daddbc/3fa4c400>
Trace; e635feaf <pg0+25dadeaf/3fa4c400>
Trace; e635f7ea <pg0+25dad7ea/3fa4c400>
Trace; e635f929 <pg0+25dad929/3fa4c400>
Trace; e6370e93 <pg0+25dbee93/3fa4c400>
Trace; e6370b00 <pg0+25dbeb00/3fa4c400>
Trace; c01052cd <kernel_thread_helper+5/18>

This architecture has variable length instructions, decoding before eip
is unreliable, take these instructions with a pinch of salt.

Code;  e635ef26 <pg0+25dacf26/3fa4c400>
00000000 <_EIP>:
Code;  e635ef26 <pg0+25dacf26/3fa4c400>
   0:   38 e6                     cmp    %ah,%dh
Code;  e635ef28 <pg0+25dacf28/3fa4c400>
   2:   8b 54 24 14               mov    0x14(%esp),%edx
Code;  e635ef2c <pg0+25dacf2c/3fa4c400>
   6:   85 d2                     test   %edx,%edx
Code;  e635ef2e <pg0+25dacf2e/3fa4c400>
   8:   0f 44 d0                  cmove  %eax,%edx
Code;  e635ef31 <pg0+25dacf31/3fa4c400>
   b:   8b 44 24 20               mov    0x20(%esp),%eax
Code;  e635ef35 <pg0+25dacf35/3fa4c400>
   f:   89 14 24                  mov    %edx,(%esp)
Code;  e635ef38 <pg0+25dacf38/3fa4c400>
  12:   89 44 24 0c               mov    %eax,0xc(%esp)
Code;  e635ef3c <pg0+25dacf3c/3fa4c400>
  16:   8b 44 24 1c               mov    0x1c(%esp),%eax
Code;  e635ef40 <pg0+25dacf40/3fa4c400>
  1a:   89 44 24 08               mov    %eax,0x8(%esp)
Code;  e635ef44 <pg0+25dacf44/3fa4c400>
  1e:   8b 44 24 18               mov    0x18(%esp),%eax
Code;  e635ef48 <pg0+25dacf48/3fa4c400>
  22:   89 44 24 04               mov    %eax,0x4(%esp)
Code;  e635ef4c <pg0+25dacf4c/3fa4c400>
  26:   e8 bf 68 dc d9            call   d9dc68ea <_EIP+0xd9dc68ea>

This decode from eip onwards should be reliable

Code;  e635ef51 <pg0+25dacf51/3fa4c400>
00000000 <_EIP>:
Code;  e635ef51 <pg0+25dacf51/3fa4c400>   <=====
   0:   c6 05 ff ff ff ff 2a      movb   $0x2a,0xffffffff   <=====
Code;  e635ef58 <pg0+25dacf58/3fa4c400>
   7:   83 c4 10                  add    $0x10,%esp
Code;  e635ef5b <pg0+25dacf5b/3fa4c400>
   a:   c3                        ret    
Code;  e635ef5c <pg0+25dacf5c/3fa4c400>
   b:   8d 74 26 00               lea    0x0(%esi),%esi
Code;  e635ef60 <pg0+25dacf60/3fa4c400>
   f:   55                        push   %ebp
Code;  e635ef61 <pg0+25dacf61/3fa4c400>
  10:   b8 ff ff ff ff            mov    $0xffffffff,%eax

 <1>Unable to handle kernel NULL pointer dereference at virtual address 
00000004
e635acf9
*pde = 1b73a067
Oops: 0000 [#2]
CPU:    0
EIP:    0060:[<e635acf9>]    Tainted: P   VLI
EFLAGS: 00010286   (2.6.9-gentoo-r6) 
eax: 00000002   ebx: 00000000   ecx: 00000008   edx: db0ff480
esi: 00000000   edi: dfc07280   ebp: dfc07288   esp: d9069f40
ds: 007b   es: 007b   ss: 0068
Stack: e639c1a0 000001f5 00000000 00000000 df60d000 c0121df0 00000000 00000001 
       00000000 d9068000 e636cd5d db0ff480 df60d000 c0121df0 00100100 c010d56a 
       df78356c c14df5e0 c14df5ec d9069fd0 e635e8a2 df78356c dfc07280 00000000 
Call Trace:
 [<c0121df0>] default_wake_function+0x0/0x20
 [<e636cd5d>] afs_osi_SleepSig+0x8d/0x110 [libafs]
 [<c0121df0>] default_wake_function+0x0/0x20
 [<c010d56a>] do_gettimeofday+0x1a/0xd0
 [<e635e8a2>] rxevent_RaiseEvents+0x82/0x190 [libafs]
 [<e635f6f8>] afs_rxevent_daemon+0x18/0xb0 [libafs]
 [<c02f532f>] sprintf+0x1f/0x30
 [<e6370de4>] afsd_thread+0x2e4/0x3a0 [libafs]
 [<e6370b00>] afsd_thread+0x0/0x3a0 [libafs]
 [<c01052cd>] kernel_thread_helper+0x5/0x18
Code: 0c 66 89 47 76 39 dd 8b 73 04 0f 84 2e fa ff ff f6 83 c4 00 00 00 01 75 
0e c7 43 0c 00 00 00 00 c7 43 08 00 00 00 00 89 f3 39 dd <8b> 76 04 75 e0 e9 
09 fa ff ff 0f b7 c1 e9 77 ff ff ff b8 02 00 


>>EIP; e635acf9 <pg0+25da8cf9/3fa4c400>   <=====

>>edx; db0ff480 <pg0+1ab4d480/3fa4c400>
>>edi; dfc07280 <pg0+1f655280/3fa4c400>
>>ebp; dfc07288 <pg0+1f655288/3fa4c400>
>>esp; d9069f40 <pg0+18ab7f40/3fa4c400>

Trace; c0121df0 <default_wake_function+0/20>
Trace; e636cd5d <pg0+25dbad5d/3fa4c400>
Trace; c0121df0 <default_wake_function+0/20>
Trace; c010d56a <do_gettimeofday+1a/d0>
Trace; e635e8a2 <pg0+25dac8a2/3fa4c400>
Trace; e635f6f8 <pg0+25dad6f8/3fa4c400>
Trace; c02f532f <sprintf+1f/30>
Trace; e6370de4 <pg0+25dbede4/3fa4c400>
Trace; e6370b00 <pg0+25dbeb00/3fa4c400>
Trace; c01052cd <kernel_thread_helper+5/18>

This architecture has variable length instructions, decoding before eip
is unreliable, take these instructions with a pinch of salt.

Code;  e635acce <pg0+25da8cce/3fa4c400>
00000000 <_EIP>:
Code;  e635acce <pg0+25da8cce/3fa4c400>
   0:   0c 66                     or     $0x66,%al
Code;  e635acd0 <pg0+25da8cd0/3fa4c400>
   2:   89 47 76                  mov    %eax,0x76(%edi)
Code;  e635acd3 <pg0+25da8cd3/3fa4c400>
   5:   39 dd                     cmp    %ebx,%ebp
Code;  e635acd5 <pg0+25da8cd5/3fa4c400>
   7:   8b 73 04                  mov    0x4(%ebx),%esi
Code;  e635acd8 <pg0+25da8cd8/3fa4c400>
   a:   0f 84 2e fa ff ff         je     fffffa3e <_EIP+0xfffffa3e>
Code;  e635acde <pg0+25da8cde/3fa4c400>
  10:   f6 83 c4 00 00 00 01      testb  $0x1,0xc4(%ebx)
Code;  e635ace5 <pg0+25da8ce5/3fa4c400>
  17:   75 0e                     jne    27 <_EIP+0x27>
Code;  e635ace7 <pg0+25da8ce7/3fa4c400>
  19:   c7 43 0c 00 00 00 00      movl   $0x0,0xc(%ebx)
Code;  e635acee <pg0+25da8cee/3fa4c400>
  20:   c7 43 08 00 00 00 00      movl   $0x0,0x8(%ebx)
Code;  e635acf5 <pg0+25da8cf5/3fa4c400>
  27:   89 f3                     mov    %esi,%ebx
Code;  e635acf7 <pg0+25da8cf7/3fa4c400>
  29:   39 dd                     cmp    %ebx,%ebp

This decode from eip onwards should be reliable

Code;  e635acf9 <pg0+25da8cf9/3fa4c400>
00000000 <_EIP>:
Code;  e635acf9 <pg0+25da8cf9/3fa4c400>   <=====
   0:   8b 76 04                  mov    0x4(%esi),%esi   <=====
Code;  e635acfc <pg0+25da8cfc/3fa4c400>
   3:   75 e0                     jne    ffffffe5 <_EIP+0xffffffe5>
Code;  e635acfe <pg0+25da8cfe/3fa4c400>
   5:   e9 09 fa ff ff            jmp    fffffa13 <_EIP+0xfffffa13>
Code;  e635ad03 <pg0+25da8d03/3fa4c400>
   a:   0f b7 c1                  movzwl %cx,%eax
Code;  e635ad06 <pg0+25da8d06/3fa4c400>
   d:   e9 77 ff ff ff            jmp    ffffff89 <_EIP+0xffffff89>
Code;  e635ad0b <pg0+25da8d0b/3fa4c400>
  12:   b8                        .byte 0xb8
Code;  e635ad0c <pg0+25da8d0c/3fa4c400>
  13:   02 00                     add    (%eax),%al


1 warning and 1 error issued.  Results may not be reliable.
-----------------------


-- 
_____________________________________________________________
   doc. dr. Andrej Filipcic,   E-mail: Andrej.Filipcic@ijs.si
   Department of Experimental High Energy Physics - F9
   Jozef Stefan Institute, Jamova 39, P.o.Box 3000
   SI-1001 Ljubljana, Slovenia
   Tel.: +386-1-477-3674    Fax: +386-1-425-7074
-------------------------------------------------------------