[OpenAFS] covering bases: hangs and halts, rh7.3 w/2.4.20-18.7 kernel, OpenAFS 1.2.9

Lee Damon nomad@ssli-mail.ee.washington.edu
Tue, 01 Jul 2003 13:17:29 -0700


While the machine we put the serial console on hasn't crashed yet, the 
following
did actually manage to get logged to our central server when another node went
into lockup.

Jul  1 13:07:28 bird1 kernel: do_IRQ: stack overflow: 296 
Jul  1 13:07:28 bird1 kernel: c0251325 00000128 00000000 000005a8 da3140e4 
c434e5b8 000005a8 c024a7cc
Jul  1 13:07:28 bird1 kernel:        000005a8 00000146 db848380 da3140e4 
c434e5b8 000005a8 00000000 68630018
Jul  1 13:07:28 bird1 kernel:        656b0018 ffffff0a c01e68f1 00000010 
00010206 c0202096 00000002 00000003
Jul  1 13:07:28 bird1 kernel: Call Trace:   [<c01e68f1>] skb_copy_bits 
[kernel] 0x51 (0xc243a808))
Jul  1 13:07:28 bird1 kernel: [<c0202096>] ip_queue_xmit [kernel] 0x4b6 
(0xc243a814))
Jul  1 13:07:28 bird1 kernel: [<c0203210>] ip_queue_xmit2 [kernel] 0x0 
(0xc243a82c))
Jul  1 13:07:28 bird1 kernel: [<e08fc664>] tcp_copy_data [sunrpc] 0x24 
(0xc243a848))
Jul  1 13:07:28 bird1 kernel: [<e0903324>] xdr_partial_copy_from_skb [sunrpc] 
0x134 (0xc243a864))
Jul  1 13:07:28 bird1 kernel: [<e08fb2bb>] tcp_data_recv [sunrpc] 0x2bb 
(0xc243a888))
Jul  1 13:07:28 bird1 kernel: [<e08fc640>] tcp_copy_data [sunrpc] 0x0 
(0xc243a898))
Jul  1 13:07:28 bird1 kernel: [<c021758e>] tcp_v4_send_check [kernel] 0x6e 
(0xc243a8c8))
Jul  1 13:07:28 bird1 kernel: [<c02085ca>] tcp_read_sock [kernel] 0x10a 
(0xc243a90c))
Jul  1 13:07:28 bird1 kernel: [<c01e57df>] alloc_skb [kernel] 0xef 
(0xc243a94c))
Jul  1 13:07:28 bird1 kernel: [<c0205340>] tcp_rfree [kernel] 0x0 
(0xc243a958))
Jul  1 13:07:29 bird1 kernel: [<e08fb448>] tcp_data_ready [sunrpc] 0x58 
(0xc243a960))
Jul  1 13:07:29 bird1 kernel: [<e08fb000>] tcp_data_recv [sunrpc] 0x0 
(0xc243a96c))
Jul  1 13:07:29 bird1 kernel: [<c0210499>] tcp_rcv_established [kernel] 0x429 
(0xc243a984))
Jul  1 13:07:29 bird1 kernel: [<c02184c8>] tcp_v4_do_rcv [kernel] 0x38 
(0xc243aa68))
Jul  1 13:07:29 bird1 kernel: [<c0218a1d>] tcp_v4_rcv [kernel] 0x46d 
(0xc243aa98))
Jul  1 13:07:29 bird1 kernel: [<e08cd93b>] nulldevname.0 [ip_tables] 0x0 
(0xc243ab0c))
Jul  1 13:07:29 bird1 kernel: [<c01ff110>] ip_local_deliver_finish [kernel] 
0x0 (0xc243ab30))
Jul  1 13:07:29 bird1 kernel: [<e08d0080>] ipt_hook [iptable_filter] 0x20 
(0xc243ab38))
Jul  1 13:07:29 bird1 kernel: [<c01ff1c7>] ip_local_deliver_finish [kernel] 
0xb7 (0xc243ab4c))
Jul  1 13:07:29 bird1 kernel: [<c01f061e>] nf_iterate [kernel] 0x2e 
(0xc243ab54))
Jul  1 13:07:29 bird1 kernel: [<c01ff110>] ip_local_deliver_finish [kernel] 
0x0 (0xc243ab68))


nomad


> On Mon, 30 Jun 2003, Lee Damon wrote:
> 
> > We are having serious reliability issues with our Red Hat 7.3 boxes running
> > the newest kernel (2.4.20-18.7) and OpenAFS 1.2.9.  I compiled the kernel
> > modules exactly the same way I have in the past (no errors, no problems
> > reported).
> >
> > The systems will run fine for anywhere from 30 minutes to multiple days,
> > then crash/hang/totally-lock-up with either:
> > 	1. scrolling messages going so fast they can't be read.  (Here's
> > 		a very small sample)
> 
> 
> Can you serially console and try to get a full oops?
> 
> > Jun 29 12:45:32 bird5 kernel: [<c02031bf>] ip_finish_output2 [kernel] 0xaf
> > (0xd9
> > 34ccb8))
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
nomad
 -----------                       - Lee "nomad" Damon -          \
work: nomad@ee.washington.edu                                      \
play: nomad@castle.org    or castle!nomad                           \
                                                                    /\
Sr. Systems Admin, UWEE SSLI Lab                                   /  \
                "Celebrate Diversity"                             /    \