[OpenAFS-devel] 1.4.2 panicking Redhat ES3

Joe Buehler jbuehler@spirentcom.com
Thu, 08 Feb 2007 10:07:15 -0500


I have a machine that is panicking every night at the same time -- an automated
build system kicks off at 19:30 and starts hammering the AFS client on the machine.

This just started happening about a week ago, the machine was fine before that.
Two things have been done recently in the cell:

- we now use a Kerberos V server for authentication purposes (using
  backwards compatibility -- we are still using klog)
- one of the DB servers (the lowest IP number) was moved to a new machine
  (the new lowest IP number)

Here is the syslog info.  The pj process checks files out of an RCS repository
in AFS.

Feb  7 19:31:14 rocky kernel: assertion failed: code != -EAGAIN, file:
/home/project-releases/tmp/openafs-1.4.2/src/afs/LINUX/osi_vnodeops.c, line: 484
Feb  7 19:31:14 rocky kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Feb  7 19:31:14 rocky kernel:  printing eip:
Feb  7 19:31:14 rocky kernel: f8c05880
Feb  7 19:31:14 rocky kernel: *pde = 26173001
Feb  7 19:31:14 rocky kernel: *pte = 00000000
Feb  7 19:31:14 rocky kernel: Oops: 0002
Feb  7 19:31:14 rocky kernel: nfs nfsd lockd sunrpc libafs-2.4.21-4.ELsmp.mp parport_pc lp parport autofs e1000 microcode loop
keybdev mousedev hid input ehci-hcd usb-uhci usbcore ext3 jbd
Feb  7 19:31:14 rocky kernel: CPU:    1
Feb  7 19:31:14 rocky kernel: EIP:    0060:[<f8c05880>]    Tainted: PF
Feb  7 19:31:14 rocky kernel: EFLAGS: 00010282
Feb  7 19:31:14 rocky kernel:
Feb  7 19:31:14 rocky kernel: EIP is at osi_Panic [libafs-2.4.21-4.ELsmp.mp] 0x20 (2.4.21-4.ELsmp)
Feb  7 19:31:14 rocky kernel: eax: 0000007a   ebx: e8787ac0   ecx: 00000000   edx: c0380e14
Feb  7 19:31:14 rocky kernel: esi: f8c2646f   edi: e8787b3b   ebp: 00000079   esp: e8787a50
Feb  7 19:31:14 rocky kernel: ds: 0068   es: 0068   ss: 0068
Feb  7 19:31:14 rocky kernel: Process pj (pid: 19918, stackpage=e8787000)
Feb  7 19:31:14 rocky kernel: Stack: e8787ac0 00000010 000001e4 00000000 00000006 f5d368e0 e8786000 f8c05b04
Feb  7 19:31:14 rocky kernel:        e8787ac0 00000010 000001e4 00000000 00000000 00000000 00000000 00000000
Feb  7 19:31:14 rocky kernel:        00000000 00000002 bfffbb3a 00000202 00000202 00000202 00000040 de6ac6f5
Feb  7 19:31:14 rocky kernel: Call Trace:   [<f8c05b04>] osi_AssertFailK [libafs-2.4.21-4.ELsmp.mp] 0x1d4 (0xe8787a6c)
Feb  7 19:31:14 rocky kernel: [<f8bc10e9>] afs_TraverseCells_nl [libafs-2.4.21-4.ELsmp.mp] 0x29 (0xe8787b3c)
Feb  7 19:31:14 rocky kernel: [<f8bc11f0>] afs_choose_cell_by_num [libafs-2.4.21-4.ELsmp.mp] 0x0 (0xe8787b50)
Feb  7 19:31:14 rocky kernel: [<f8bc113d>] afs_TraverseCells [libafs-2.4.21-4.ELsmp.mp] 0x3d (0xe8787b5c)
Feb  7 19:31:14 rocky kernel: [<f8bc11f0>] afs_choose_cell_by_num [libafs-2.4.21-4.ELsmp.mp] 0x0 (0xe8787b60)
Feb  7 19:31:14 rocky kernel: [<f8bc13f0>] afs_GetCellStale [libafs-2.4.21-4.ELsmp.mp] 0x30 (0xe8787b7c)
Feb  7 19:31:14 rocky kernel: [<f8bc14c2>] afs_IsPrimaryCellNum [libafs-2.4.21-4.ELsmp.mp] 0x22 (0xe8787b9c)
Feb  7 19:31:14 rocky kernel: [<f8bc10e9>] afs_TraverseCells_nl [libafs-2.4.21-4.ELsmp.mp] 0x29 (0xe8787bac)
Feb  7 19:31:14 rocky kernel: [<f8bddb13>] afs_FindVCache [libafs-2.4.21-4.ELsmp.mp] 0x73 (0xe8787bbc)
Feb  7 19:31:15 rocky kernel: [<f8bc113d>] afs_TraverseCells [libafs-2.4.21-4.ELsmp.mp] 0x3d (0xe8787bcc)
Feb  8 09:35:57 rocky syslogd 1.4.1: restart.
Feb  8 09:35:57 rocky syslog: syslogd startup succeeded

The CellServDB file has:

>hekimian.com  #Spirent Communications Rockville, MD division
10.32.90.51    #bullwinkle.hekimian.com
10.32.90.94    #crater.hekimian.com
10.32.90.99    #cetus.hekimian.com

The ThisCell file has:

hekimian.com

The afsd options are:

/usr/vice/etc/afsd -stat 10000 -dcache 2400 -daemons 5 -volumes 128 -nosettime -afsdb -dynroot -fakestat -afsdb -dynroot

fs listcells shows the following for hekimian.com:

Cell hekimian.com on hosts crater.hekimian.com bullwinkle.hekimian.com cetus.hekimian.com.
-- 
Joe Buehler