[OpenAFS] covering bases: hangs and halts, rh7.3 w/2.4.20-18.7 kernel, OpenAFS 1.2.9

Lee Damon nomad@ssli-mail.ee.washington.edu
Mon, 30 Jun 2003 11:06:46 -0700


I'm grasping at straws here.  I'm looking for anyone else who may be
seeing similar symptoms.

We are having serious reliability issues with our Red Hat 7.3 boxes running
the newest kernel (2.4.20-18.7) and OpenAFS 1.2.9.  I compiled the kernel
modules exactly the same way I have in the past (no errors, no problems 
reported).

The systems will run fine for anywhere from 30 minutes to multiple days,
then crash/hang/totally-lock-up with either:
	1. scrolling messages going so fast they can't be read.  (Here's
		a very small sample)

Jun 29 12:45:32 bird5 kernel: [<c02031bf>] ip_finish_output2 [kernel] 0xaf 
(0xd9
34ccb8)) 
Jun 29 12:45:32 bird5 kernel: [<c0203110>] ip_finish_output2 [kernel] 0x0 
(0xd93
4ccc0)) 
Jun 29 12:45:32 bird5 kernel: [<e09a893b>] nulldevname.0 [ip_tables] 0x0 
(0xd934
ccf8)) 
Jun 29 12:45:33 bird5 kernel: [<c01ff110>] ip_local_deliver_finish [kernel] 
0x0
(0xd934cd1c)) 
Jun 29 12:45:33 bird5 kernel: [<c01ff1c7>] ip_local_deliver_finish [kernel] 
0xb7
 (0xd934cd38)) 
Jun 29 12:45:33 bird5 kernel: [<c01ff110>] ip_local_deliver_finish [kernel] 
0x0
(0xd934cd54)) 
Jun 29 12:45:33 bird5 kernel: [<c01ff110>] ip_local_deliver_finish [kernel] 
0x0
(0xd934cd64)) 
Jun 29 12:45:33 bird5 kernel: [<c01ff110>] ip_local_deliver_finish [kernel] 
0x0
(0xd934cd7c)) 
Jun 29 12:45:33 bird5 kernel: [<e09b86dd>] tcp_packet [ip_conntrack] 0x1dd 
(0xd9
34cd90)) 
Jun 29 12:45:34 bird5 kernel: [<e09b6b19>] ip_conntrack_in [ip_conntrack] 
0x209
(0xd934cdac)) 
Jun 29 12:45:34 bird5 kernel: [<c01fed1b>] ip_local_deliver [kernel] 0x17b 
(0xd9
34cdc0)) 
Jun 29 12:45:34 bird5 kernel: [<c01ff110>] ip_local_deliver_finish [kernel] 
0x0

	2. Kernel panic complaining about ip_conntrack

	3. dead silence (totally blank screen) or a frozen screensaver frame.

We have made sure iptables and its related modules aren't loaded.

Red Hat's problem tracking system doesn't seem to have anything related and
we don't expect they would respond to our reports since we have OpenAFS
running.

nomad
 -----------                       - Lee "nomad" Damon -          \
work: nomad@ee.washington.edu                                      \
play: nomad@castle.org    or castle!nomad                           \
                                                                    /\
Sr. Systems Admin, UWEE SSLI Lab                                   /  \
                "Celebrate Diversity"                             /    \