[OpenAFS] Kernel NULL pointer dereference

Christof Hanke christof.hanke@rzg.mpg.de
Fri, 20 Apr 2012 08:45:02 +0200


Am 20.04.2012 03:55, schrieb Ken Elkabany:
> We have 2 OpenAFS servers running 1.4.14. We have many clients that we
> just switched over to 1.6.1pre1. Starting earlier today, we started
Not sure if it helps in your situation, but 1.6.1 is out. Try using this.

T/Christof

> getting NULL pointer dereferences, which has been completely hosing the
> clients. The client machines hang on any call that deals with AFS,
> whether it's "ls /", "ls /afs", "klist", etc... A "vos changeaddr" was
> done earlier today, whereby a large collection (4000) of volumes were
> mistakenly assigned to another server. These were corrected with "vos
> syncvldb" followed by "vos syncserv". I mention it here, as it's the
> only thing we've done to the AFS cluster today.
>
> Here's what we found in the syslog:
>
> Apr 20 01:30:43 SERVER kernel: [12861236.027818] BUG: unable to handle
> kernel NULL pointer dereference at 0000000000000028
> Apr 20 01:30:43 SERVER kernel: [12861236.027836] IP:
> [<ffffffffa0048087>] afs_Conn+0x1e7/0x260 [openafs]
> Apr 20 01:30:43 SERVER kernel: [12861236.027868] PGD 0
> Apr 20 01:30:43 SERVER kernel: [12861236.027874] Oops: 0000 [#1] SMP
> Apr 20 01:30:43 SERVER kernel: [12861236.027882] CPU 6
> Apr 20 01:30:43 SERVER kernel: [12861236.027885] Modules linked in:
> openafs(P) isofs acpiphp
> Apr 20 01:30:43 SERVER kernel: [12861236.027897]
> Apr 20 01:30:43 SERVER kernel: [12861236.027902] Pid: 1568, comm:
> apache2 Tainted: P           O 3.2.0-23-virtual #36-Ubuntu
> Apr 20 01:30:43 SERVER kernel: [12861236.027912] RIP:
> e030:[<ffffffffa0048087>]  [<ffffffffa0048087>] afs_Conn+0x1e7/0x260
> [openafs]
> Apr 20 01:30:43 SERVER kernel: [12861236.027936] RSP:
> e02b:ffff88017f417808  EFLAGS: 00010282
> Apr 20 01:30:43 SERVER kernel: [12861236.027942] RAX: ffffc9000188dbe0
> RBX: 0000000000000000 RCX: 000000000000581b
> Apr 20 01:30:43 SERVER kernel: [12861236.027950] RDX: ffff8801b112a000
> RSI: 0000000000000001 RDI: ffff88017f761680
> Apr 20 01:30:43 SERVER kernel: [12861236.027957] RBP: ffff88017f417858
> R08: 0000000000000000 R09: 0000000000000000
> Apr 20 01:30:43 SERVER kernel: [12861236.027964] R10: 0000000000000002
> R11: 0000000000000000 R12: ffff880184756f48
> Apr 20 01:30:43 SERVER kernel: [12861236.027971] R13: ffff88017f417a20
> R14: 0000000000000004 R15: ffff88017f4178f0
> Apr 20 01:30:43 SERVER kernel: [12861236.027983] FS:
>   00007f1f6ae2f700(0000) GS:ffff8801bff73000(0000) knlGS:0000000000000000
> Apr 20 01:30:43 SERVER kernel: [12861236.027991] CS:  e033 DS: 0000 ES:
> 0000 CR0: 000000008005003b
> Apr 20 01:30:43 SERVER kernel: [12861236.027998] CR2: 0000000000000028
> CR3: 0000000181465000 CR4: 0000000000002660
> Apr 20 01:30:43 SERVER kernel: [12861236.028006] DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> Apr 20 01:30:43 SERVER kernel: [12861236.028013] DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Apr 20 01:30:43 SERVER kernel: [12861236.028021] Process apache2 (pid:
> 1568, threadinfo ffff88017f416000, task ffff88017f41adc0)
> Apr 20 01:30:43 SERVER kernel: [12861236.028028] Stack:
> Apr 20 01:30:43 SERVER kernel: [12861236.028032]  000000004e2a6741
> 0000000000000000 0000000000000000 000000004f90bc43
> Apr 20 01:30:43 SERVER kernel: [12861236.028046]  000000000001584a
> ffff880184756cc0 ffff88017f41adc0 ffff88017f417a20
> Apr 20 01:30:43 SERVER kernel: [12861236.028059]  ffff880184756f48
> ffff880184756cc0 ffff88017f417928 ffffffffa0068658
> Apr 20 01:30:43 SERVER kernel: [12861236.028072] Call Trace:
> Apr 20 01:30:43 SERVER kernel: [12861236.028092]  [<ffffffffa0068658>]
> afs_FetchStatus+0x58/0x450 [openafs]
> Apr 20 01:30:43 SERVER kernel: [12861236.028113]  [<ffffffffa004672b>] ?
> afs_GetCellStale+0x3b/0x60 [openafs]
> Apr 20 01:30:43 SERVER kernel: [12861236.028134]  [<ffffffffa0046a25>] ?
> afs_IsPrimaryCell+0x25/0x40 [openafs]
> Apr 20 01:30:43 SERVER kernel: [12861236.028157]  [<ffffffffa0082b80>] ?
> afs_GetVolume+0x40/0x1d0 [openafs]
> Apr 20 01:30:43 SERVER kernel: [12861236.028179]  [<ffffffffa006ae8d>]
> afs_GetVCache+0x26d/0x5d0 [openafs]
> Apr 20 01:30:43 SERVER kernel: [12861236.028200]  [<ffffffffa006b343>]
> afs_VerifyVCache2+0x153/0x200 [openafs]
> Apr 20 01:30:43 SERVER kernel: [12861236.028222]  [<ffffffffa006ccec>]
> afs_getattr+0x29c/0x350 [openafs]
> Apr 20 01:30:43 SERVER kernel: [12861236.028242]  [<ffffffffa009340f>]
> afs_linux_dentry_revalidate+0x39f/0x470 [openafs]
> Apr 20 01:30:43 SERVER kernel: [12861236.028265]  [<ffffffffa006bf43>] ?
> afs_AccessOK+0x113/0x1e0 [openafs]
> Apr 20 01:30:43 SERVER kernel: [12861236.028279]  [<ffffffff816552de>] ?
> _raw_spin_lock+0xe/0x20
> Apr 20 01:30:43 SERVER kernel: [12861236.028290]  [<ffffffff811818eb>]
> do_lookup+0x18b/0x310
> Apr 20 01:30:43 SERVER kernel: [12861236.028298]  [<ffffffff8129885c>] ?
> security_inode_permission+0x1c/0x30
> Apr 20 01:30:43 SERVER kernel: [12861236.028306]  [<ffffffff81182268>]
> link_path_walk+0x138/0x870
> Apr 20 01:30:43 SERVER kernel: [12861236.028313]  [<ffffffff811834ad>] ?
> path_init+0x2ed/0x3c0
> Apr 20 01:30:43 SERVER kernel: [12861236.028319]  [<ffffffff811835d8>]
> path_lookupat+0x58/0x750
> Apr 20 01:30:43 SERVER kernel: [12861236.028339]  [<ffffffffa006cb3c>] ?
> afs_getattr+0xec/0x350 [openafs]
> Apr 20 01:30:43 SERVER kernel: [12861236.028348]  [<ffffffff810067be>] ?
> xen_pmd_val+0xe/0x10
> Apr 20 01:30:43 SERVER kernel: [12861236.028355]  [<ffffffff81183d01>]
> do_path_lookup+0x31/0xc0
> Apr 20 01:30:43 SERVER kernel: [12861236.028362]  [<ffffffff81184809>]
> user_path_at_empty+0x59/0xa0
> Apr 20 01:30:43 SERVER kernel: [12861236.028369]  [<ffffffff8100aa32>] ?
> check_events+0x12/0x20
> Apr 20 01:30:43 SERVER kernel: [12861236.028377]  [<ffffffff8100a25d>] ?
> xen_force_evtchn_callback+0xd/0x10
> Apr 20 01:30:43 SERVER kernel: [12861236.028384]  [<ffffffff81184861>]
> user_path_at+0x11/0x20
> Apr 20 01:30:43 SERVER kernel: [12861236.028391]  [<ffffffff8117995a>]
> vfs_fstatat+0x3a/0x70
> Apr 20 01:30:43 SERVER kernel: [12861236.028398]  [<ffffffff8100aa1f>] ?
> xen_restore_fl_direct_reloc+0x4/0x4
> Apr 20 01:30:43 SERVER kernel: [12861236.028405]  [<ffffffff8100465d>] ?
> xen_clts+0x8d/0x190
> Apr 20 01:30:43 SERVER kernel: [12861236.028412]  [<ffffffff811799ae>]
> vfs_lstat+0x1e/0x20
> Apr 20 01:30:43 SERVER kernel: [12861236.028418]  [<ffffffff81179b4a>]
> sys_newlstat+0x1a/0x40
> Apr 20 01:30:43 SERVER kernel: [12861236.028427]  [<ffffffff810146e1>] ?
> math_state_restore+0x51/0x80
> Apr 20 01:30:43 SERVER kernel: [12861236.028435]  [<ffffffff816562fe>] ?
> do_device_not_available+0xe/0x10
> Apr 20 01:30:43 SERVER kernel: [12861236.028445]  [<ffffffff8165f8cb>] ?
> device_not_available+0x1b/0x20
> Apr 20 01:30:43 SERVER kernel: [12861236.028452]  [<ffffffff8165d8c2>]
> system_call_fastpath+0x16/0x1b
> Apr 20 01:30:43 SERVER kernel: [12861236.028458] Code: 89 ef 48 89 45 c8
> e8 39 c4 01 00 48 8b 45 c8 48 83 c4 28 5b 41 5c 41 5d 41 5e 41 5f 5d c3
> 48 85 ff 0f 84 95 fe ff ff 48 8b 5f 58 <f6> 43 28 20 0f 85 87 fe ff ff
> 41 80 7d 12 00 7e 29 41 80 7d 13
> Apr 20 01:30:43 SERVER kernel: [12861236.028543] RIP
>   [<ffffffffa0048087>] afs_Conn+0x1e7/0x260 [openafs]
> Apr 20 01:30:43 SERVER kernel: [12861236.028563]  RSP <ffff88017f417808>
> Apr 20 01:30:43 SERVER kernel: [12861236.028568] CR2: 0000000000000028

-- 
The future is all around us, waiting in moments of transition to be born
in moments of revelation. No one knows the shape of that future or where
it will take us. We know only that it is always born in pain.
   -- G'Quan
Let's update the servers!
-----------------------------------------------------------------
Christof Hanke                 		e-mail hanke@rzg.mpg.de
RZG (Rechenzentrum Garching)		phone +49-89-3299-1041
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut für Plasmaphysik (IPP)