[OpenAFS-devel] OpenAFS 1.2.7 fileserver repeatedly crashes in rxi_AttachServerProc
around 4:30 in the morning
Rainer Toebbicke
rtb@pclella.cern.ch
Tue, 17 Dec 2002 12:27:59 +0100
Hello,
We recently had *three* SEGV crashes each night (on separate days) around 4:30
in the morning. Seen on an OpenAFS 1.2.7 Solaris 2.8 fileserver in
rxi_AttachServerProc() on queue_Remove(call).
Nothing odd in the FileLog, not aware of anything peculiar happening at 4:30.
The thread holds rx_serverPool_lock all right.
The call to be removed seems to be (have been) the only one in the queue:
however, queue_Remove is not exactly a simplistic macro so I could be wrong there.
print &rx_incomingCallQueue
&rx_incomingCallQueue = 0x1aa450
(struct rx_queue *) call = 0x8693e8
((struct rx_queue *) call)->prev = 0x1aa450
((struct rx_queue *) call)->prev->next = (nil)
The assembly of that line is
0x000d11f8: rxi_AttachServerProc+0x0568: ld [%i0 + 0x4], %l0
0x000d11fc: rxi_AttachServerProc+0x056c: st %l0, [%fp - 0x14]
0x000d1200: rxi_AttachServerProc+0x0570: ld [%fp - 0x14], %l1
0x000d1204: rxi_AttachServerProc+0x0574: ld [%i0], %l0
0x000d1208: rxi_AttachServerProc+0x0578: st %l1, [%l0 + 0x4]
0x000d120c: rxi_AttachServerProc+0x057c: ld [%i0], %l1
0x000d1210: rxi_AttachServerProc+0x0580: ld [%fp - 0x14], %l0
0x000d1214: rxi_AttachServerProc+0x0584: st %l1, [%l0]
0x000d1218: rxi_AttachServerProc+0x0588: st %g0, [%i0 + 0x4]
and the regs
g0-g3 0x00000000 0x000ab000 0x00000000 0x00000000
g4-g7 0x00000000 0x00000000 0x00000000 0xfd509d78
o0-o3 0x00000000 0xff0ee000 0x001a7b50 0x00000000
o4-o7 0x00000000 0x00000000 0xfd509880 0x000d11ac
l0-l3 0x00000000 0x001aa450 0x00000000 0x00000000
l4-l7 0x00000000 0x00000000 0x00000000 0x00000001
i0-i3 0x008693e8 0xffffffff 0x00000000 0x00000000
i4-i7 0x001aec50 0x0070aa98 0xfd5098f8 0x000cdb14
y 0x003a1c8a
ccr 0xfe401004
pc 0x000d1214:rxi_AttachServerProc+0x584 st %l1, [%l0]
npc 0x000d1218:rxi_AttachServerProc+0x588 st %g0, [%i0 + 0x4]
From this I'd conclude that it's the _QR(i) part of queue_Remove that goes
wrong because of the '...->next = (nil)' above.
Any ideas ?
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke http://cern.ch/~rtb rtb@mail.cern.ch O__
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland > |
Phone: +41 22 767 8985 Fax: +41 22 767 7155 ( )\( )