[OpenAFS-devel] some crashes

Sabin Iacob iacobs@exotic4.nipne.ro
Mon, 25 Jul 2005 15:11:10 +0300


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi, list
I've been running a test cell for a few days, with a
1.3.85/2.6.12-nitro2/gentoo master, one 1.2.11/suse 9.0 database
server, and a 1.2.11/suse 9.0 client. The servers also act as clients.
Although the server parts are working ok, I've received some crashes
from the clients:
first, the master server, with the following message:

Jul 21 08:41:46 exotic4 Unable to handle kernel paging request at
virtual address e5e1c40a
Jul 21 08:41:46 exotic4 printing eip:
Jul 21 08:41:46 exotic4 e5e1c40a
Jul 21 08:41:46 exotic4 *pde = 1f5a4067
Jul 21 08:41:46 exotic4 Oops: 0000 [#1]
Jul 21 08:41:46 exotic4 PREEMPT
Jul 21 08:41:46 exotic4 Modules linked in: it87 i2c_sensor i2c_isa
ipt_REJECT ipt_state ipt_multiport iptable_filter iptable_nat
ip_conntrack ip_tables s
nd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq
snd_via82xx snd_ac97_codec snd_pcm snd_timer snd_page_alloc
snd_mpu401_uart snd_rawmidi s
nd_seq_device snd soundcore ext2 ppp_async ppp_generic slhc crc_ccitt
psmouse
Jul 21 08:41:46 exotic4 CPU:    0
Jul 21 08:41:46 exotic4 EIP:    0060:[<e5e1c40a>]    Tainted: P      VLI
Jul 21 08:41:46 exotic4 EFLAGS: 00010296   (2.6.12-nitro2)
Jul 21 08:41:46 exotic4 EIP is at 0xe5e1c40a
Jul 21 08:41:46 exotic4 eax: 00000000   ebx: d5336000   ecx:
ffffffff   edx: 00000000
Jul 21 08:41:46 exotic4 esi: d4fc0670   edi: d4fc0660   ebp:
00000000   esp: d5337f50
Jul 21 08:41:46 exotic4 ds: 007b   es: 007b   ss: 0068
Jul 21 08:41:46 exotic4 Process afs_rxevent (pid: 11394,
threadinfo=d5336000 task=d52de590)
Jul 21 08:41:46 exotic4 Stack: 000001f5 00000001 00000000 d52de590
c0116100 00000000 00000000 da38ac04
Jul 21 08:41:46 exotic4 da38ac04 c0103ad4 00000000 d52de590 c0116100
d4fc0670 d4fc0670 d7aa7c90
Jul 21 08:41:46 exotic4 de7ed790 de7ed79c 42df3599 00000000 000001f4
d5336000 e5e1bff5 d5336000
Jul 21 08:41:46 exotic4 Call Trace:
Jul 21 08:41:46 exotic4 [<c0116100>] default_wake_function+0x0/0x10
Jul 21 08:41:46 exotic4 [<c0103ad4>] apic_timer_interrupt+0x1c/0x24
Jul 21 08:41:46 exotic4 [<c0116100>] default_wake_function+0x0/0x10
Jul 21 08:41:46 exotic4 [<c010132d>] kernel_thread_helper+0x5/0x18
Jul 21 08:41:46 exotic4 Code:  Bad EIP value.

The server processes went on just fine, bos status said everything ran
normally, but I had to reboot, since my home dir had become
inaccessible and I am not aware of any way to _really_ forcefully
remove a faulty module (rmmod -f says it is busy). At the time, I was
moving around a large (~30 GB) volume. This oops repeated a few times
(once while starting firefox and thunderbird at the same time;
thunderbird checks for new mail at startup, and the maildir is large,
~200 MB, 26k messages).
The second one to die was the 1.2.11 client, while I was compiling
ROOT (twice):

Jul 24 11:38:38 exotic2 kernel: dcache hc<1>Unable to handle kernel
paging request at virtual address ffffffff
Jul 24 11:38:38 exotic2 kernel:  printing eip:
Jul 24 11:38:38 exotic2 kernel: c4abd590
Jul 24 11:38:38 exotic2 kernel: *pde = 00004063
Jul 24 11:38:38 exotic2 kernel: Oops: 0002 2.4.21-99-athlon #1 Wed Sep
24 13:34:32 UTC 2003
Jul 24 11:38:38 exotic2 kernel: CPU:    0
Jul 24 11:38:38 exotic2 kernel: EIP:   
0010:[usb-uhci:uhci_device_operations+37681368/22528947]    Tainted: P
Jul 24 11:38:38 exotic2 kernel: EIP:    0010:[<c4abd590>]    Tainted: P
Jul 24 11:38:38 exotic2 kernel: EFLAGS: 00010292
Jul 24 11:38:38 exotic2 kernel: eax: 00000009   ebx: 00000030   ecx:
c4a4202c   edx: 00000001
Jul 24 11:38:38 exotic2 kernel: esi: e2de57d4   edi: c4a58000   ebp:
00000001   esp: c4a2dde4
Jul 24 11:38:38 exotic2 kernel: ds: 0018   es: 0018   ss: 0018
Jul 24 11:38:38 exotic2 kernel: Process afs_cachetrim (pid: 1967,
stackpage=c4a2d000)
Jul 24 11:38:38 exotic2 kernel: Stack: c4ada6df 0000000a 00000001
c4a8921c e2de57d4 0000000a 00000001 c4a896bd
Jul 24 11:38:38 exotic2 kernel:        c4ada6df 0000000a 00000001
c4a8921c e2de57d4 00019da6 0000000a c4a894c4
Jul 24 11:38:38 exotic2 kernel:        e2de57d4 00000000 e2de5770
e2de5770 3b3b66c2 c2d11440 00000282 c4a2c000
Jul 24 11:38:39 exotic2 kernel: Call Trace:   
[usb-uhci:uhci_device_operations+37800487/22409828]
[usb-uhci:uhci_device_operations+37467492/22742823] [u
sb-uhci:uhci_device_operations+37468677/22741638]
[usb-uhci:uhci_device_operations+37800487/22409828]
[usb-uhci:uhci_device_operations+37467492/22742823]
Jul 24 11:38:39 exotic2 kernel: Call Trace:    [<c4ada6df>]
[<c4a8921c>] [<c4a896bd>] [<c4ada6df>] [<c4a8921c>]
Jul 24 11:38:39 exotic2 kernel:  
[usb-uhci:uhci_device_operations+37468172/22742143]
[usb-uhci:uhci_device_operations+37465814/22744501] [usb-uhci:uhci_
device_operations+37738558/22471757]
[usb-uhci:uhci_device_operations+37804078/22406237]
[arch_kernel_thread+43/64] [usb-uhci:uhci_device_operations+3773
8184/22472131]
Jul 24 11:38:39 exotic2 kernel:   [<c4a894c4>] [<c4a88b8e>]
[<c4acb4f6>] [<c4adb4e6>] [<c010736b>] [<c4acb380>]
Jul 24 11:38:39 exotic2 kernel: Modules: [(libafs:<c4a80060>:<c4aecac0>)]
Jul 24 11:38:39 exotic2 kernel: Code: c6 05 ff ff ff ff 2a 83 c4 1c c3
90 8d 74 26 00 b8 d2 b0 ad

Again, the servers went on just fine and were accessible from the
other machines (erm, themselves :-\).

I hope the devs will find this inforation useful; although I program
c/c++, I have absolutely no idea about the inner workings of the
kernel, otherwise I would have got my hands dirty already ;).

Regards.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFC5NbepFveV/JdohERAk6hAJ0YWgJEoFtk2gd+Big207RQl20engCfRhzM
mCa9uqNq08FgZHTIbi1VMCM=
=UbfX
-----END PGP SIGNATURE-----