[OpenAFS-devel] reliable crashing using -memcache
Stefaan
stefaan.deroeck@gmail.com
Wed, 24 Aug 2005 13:09:45 +0200
Hi!
I have a 2.6.12-gentoo-r4 kernel, single CPU p4, SMP (HT) enabled,
preemption disabled. I'm running openafs 1.3.87.
When I start "afsd" with the parameters -memcache -chunksize 14 -afsdb
-dynroot, and when I have the following /etc/openafs/cacheinfo:
/afs:/usr/vice/cache:500000 (When using cachesize 50000, the
problem doesn't occur, or at least not as easily (which means: I have
seen errors when using smaller cachesize, but they may well have been
caused by something else))
The console displays
"afsd: All AFS daemons started."
and then waits forever. Very shortly after that, I get a kernel oops.
The machine doesn't hang however.
In ps auxwf I find:
root 12829 0.0 0.0 2000 868 tty3 D+ 12:56 0:00 =20
\_ /usr/sbin/afsd -memcache -
chunksize 14 -afsdb -dynroot
root 12833 0.0 0.0 0 0 tty3 Z<+ 12:56 0:00 =20
\_ [afsd] <defunct>
root 12834 0.0 0.0 0 0 tty3 Z+ 12:56 0:00 =20
\_ [afsd] <defunct>
root 12837 0.0 0.0 0 0 tty3 Z<+ 12:56 0:00 =20
\_ [afsd] <defunct>
root 12839 0.0 0.0 0 0 tty3 Z+ 12:56 0:00 =20
\_ [afsd] <defunct>
root 12842 0.0 0.0 0 0 tty3 Z+ 12:56 0:00 =20
\_ [afsd] <defunct>
root 12844 0.0 0.0 1996 860 tty3 D+ 12:56 0:00 =20
\_ /usr/sbin/afsd -memcac
he -chunksize 14 -afsdb -dynroot
root 12846 0.0 0.0 0 0 tty3 Z+ 12:56 0:00 =20
\_ [afsd] <defunct>
root 12848 0.0 0.0 1996 860 tty3 D+ 12:56 0:00 =20
\_ /usr/sbin/afsd -memcac
he -chunksize 14 -afsdb -dynroot
root 12850 0.0 0.0 0 0 tty3 Z+ 12:56 0:00 =20
\_ [afsd] <defunct>
and also:
root 12835 0.0 0.0 0 0 ? S 12:56 0:00
[afs_rxlistener]
root 12836 0.0 0.0 0 0 ? S 12:56 0:00 [afs_callb=
ack]
root 12838 0.0 0.0 0 0 ? D 12:56 0:00 [afs_rxeve=
nt]
root 12840 0.0 0.0 1996 860 ? Ss 12:56 0:00
/usr/sbin/afsd -memcache -chunksize 14
-afsdb -dynroot
root 12843 0.0 0.0 0 0 ? D 12:56 0:00 [afsd]
root 12845 0.0 0.0 0 0 ? D 12:56 0:00
[afs_checkserver]
root 12847 0.0 0.0 0 0 ? S 12:56 0:00
[afs_background]
root 12849 0.0 0.0 0 0 ? S 12:56 0:00
[afs_background]
The oops looks like this: (dmesg | ksymoops)
ksymoops 2.4.11 on i686 2.6.12-gentoo-r4. =20
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.6.12-gentoo-r4/ (default)
-m /boot/kernel-2.6.12-gentoo-r4/System.map (specified)
Error (regular_file): read_ksyms stat /proc/ksyms failed
ksymoops: No such file or directory
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
Machine check exception polling timer started.
SGI XFS with large block numbers, no debug enabled
ehci_hcd 0000:00:1d.7: debug port 1
Unable to handle kernel NULL pointer dereference at virtual address 0000014=
7
f9b58c77
*pde =3D 00000000
Oops: 0000 [#1]
CPU: 1
EIP: 0060:[<f9b58c77>] Tainted: P VLI
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010206 (2.6.12-gentoo-r4)
eax: f9bd34d4 ebx: 000000d7 ecx: 00007a12 edx: 00000000
esi: 0000000a edi: 00000000 ebp: 00000000 esp: cea27e30
ds: 007b es: 007b ss: 0068
Stack: c011cb47 cf372520 f6329300 32b7b53b 32b7b53b 0000006e cea27e68 c011c=
c9e
cf372520 c1807558 d17226e0 d1722520 f5cca580 d1722648 00000004 00000=
0d7
00000000 00000009 c042aa62 00000000 00000002 00000001 00000000 cea27=
ea8
Call Trace:
[<c011cb47>] recalc_task_prio+0x8e/0x155
[<c011cc9e>] activate_task+0x90/0xa4
[<c042aa62>] schedule+0x3c6/0xc81
[<c042a514>] __down+0xcc/0xdb
[<c011ed5a>] default_wake_function+0x0/0x12
[<c0137a91>] remove_wait_queue+0x1a/0x4a
[<f9ba772c>] afs_osi_SleepSig+0x150/0x1a7 [libafs]
[<f9b5821a>] afs_CacheTruncateDaemon+0x0/0x456 [libafs]
[<c011ed5a>] default_wake_function+0x0/0x12
[<f9ba7819>] afs_osi_Sleep+0x96/0xbb [libafs]
[<c010788c>] do_gettimeofday+0x1e/0xbf
[<f9b58325>] afs_CacheTruncateDaemon+0x10b/0x456 [libafs]
[<f9bac7b0>] afsd_thread+0x3d0/0x656 [libafs]
[<f9bac3e0>] afsd_thread+0x0/0x656 [libafs]
[<c0101401>] kernel_thread_helper+0x5/0xb
Code: 31 bd f9 7c ec 8b 84 24 74 01 00 00 85 c0 0f 8e 55 01 00 00 8b
0d 44 31 bd f9 e9 39 fb ff ff a1 64 31 bd f9 8b 1c b0 85 db 74 0b <66>
83 7b 70 00 0f 85 5a fb ff ff a1 e4 31 bd f9 80 e2 08 8b 3c
>>EIP; f9b58c77 <pg0+3959fc77/3fa45400> <=3D=3D=3D=3D=3D
>>eax; f9bd34d4 <pg0+3961a4d4/3fa45400>
>>esp; cea27e30 <pg0+e46ee30/3fa45400>
Trace; c011cb47 <recalc_task_prio+8e/155>
Trace; c011cc9e <activate_task+90/a4>
Trace; c042aa62 <schedule+3c6/c81>
Trace; c042a514 <__down+cc/db>
Trace; c011ed5a <default_wake_function+0/12>
Trace; c0137a91 <remove_wait_queue+1a/4a>
Trace; f9ba772c <pg0+395ee72c/3fa45400>
Trace; f9b5821a <pg0+3959f21a/3fa45400>
Trace; c011ed5a <default_wake_function+0/12>
Trace; f9ba7819 <pg0+395ee819/3fa45400>
Trace; c010788c <do_gettimeofday+1e/bf>
Trace; f9b58325 <pg0+3959f325/3fa45400>
Trace; f9bac7b0 <pg0+395f37b0/3fa45400>
Trace; f9bac3e0 <pg0+395f33e0/3fa45400>
Trace; c0101401 <kernel_thread_helper+5/b>
This architecture has variable length instructions, decoding before eip
is unreliable, take these instructions with a pinch of salt.
Code; f9b58c4c <pg0+3959fc4c/3fa45400>
00000000 <_EIP>:
Code; f9b58c4c <pg0+3959fc4c/3fa45400>
0: 31 bd f9 7c ec 8b xor %edi,0x8bec7cf9(%ebp)
Code; f9b58c52 <pg0+3959fc52/3fa45400>
6: 84 24 74 test %ah,(%esp,%esi,2)
Code; f9b58c55 <pg0+3959fc55/3fa45400>
9: 01 00 add %eax,(%eax)
Code; f9b58c57 <pg0+3959fc57/3fa45400>
b: 00 85 c0 0f 8e 55 add %al,0x558e0fc0(%ebp)
Code; f9b58c5d <pg0+3959fc5d/3fa45400>
11: 01 00 add %eax,(%eax)
Code; f9b58c5f <pg0+3959fc5f/3fa45400>
13: 00 8b 0d 44 31 bd add %cl,0xbd31440d(%ebx)
Code; f9b58c65 <pg0+3959fc65/3fa45400>
19: f9 stc
Code; f9b58c66 <pg0+3959fc66/3fa45400>
1a: e9 39 fb ff ff jmp fffffb58 <_EIP+0xfffffb58>
Code; f9b58c6b <pg0+3959fc6b/3fa45400>
1f: a1 64 31 bd f9 mov 0xf9bd3164,%eax
Code; f9b58c70 <pg0+3959fc70/3fa45400>
24: 8b 1c b0 mov (%eax,%esi,4),%ebx
Code; f9b58c73 <pg0+3959fc73/3fa45400>
27: 85 db test %ebx,%ebx
Code; f9b58c75 <pg0+3959fc75/3fa45400>
29: 74 0b je 36 <_EIP+0x36>
This decode from eip onwards should be reliable
Code; f9b58c77 <pg0+3959fc77/3fa45400>
00000000 <_EIP>:
Code; f9b58c77 <pg0+3959fc77/3fa45400> <=3D=3D=3D=3D=3D
0: 66 83 7b 70 00 cmpw $0x0,0x70(%ebx) <=3D=3D=3D=3D=3D
Code; f9b58c7c <pg0+3959fc7c/3fa45400>
5: 0f 85 5a fb ff ff jne fffffb65 <_EIP+0xfffffb65>
Code; f9b58c82 <pg0+3959fc82/3fa45400>
b: a1 e4 31 bd f9 mov 0xf9bd31e4,%eax
Code; f9b58c87 <pg0+3959fc87/3fa45400>
10: 80 e2 08 and $0x8,%dl
Code; f9b58c8a <pg0+3959fc8a/3fa45400>
13: 8b .byte 0x8b
Code; f9b58c8b <pg0+3959fc8b/3fa45400>
14: 3c .byte 0x3c
1 error issued. Results may not be reliable.
Cheers,
Stefaan