[OpenAFS] Kernel panic running 1.8.4 on linux 3.10.0-1062.7.1.el7.x86_64

M. Casper Lewis mclewis@genomecenter.ucdavis.edu
Mon, 13 Jan 2020 12:39:27 -0800


On Thu, Jan 09, 2020 at 01:02:35PM +0000, Neil Brown wrote:
> This is just a heads up in case anyone else has seen or sees
> something similar.

*hijacking thread*

We've been seeing fairly regular panics on 1.8.2 running on Centos 7 (log
below).  It's clearly a ZFS bug and not AFS, but the curious thing is
that:

* we've never seen this on any of our FreeBSD AFS servers (also ZFS)
* we've never seen this ZFS behavior for anything other than our AFS
  fileservers, running various flavors of Linux, FreeBSD, and IllumOS.
  (NFS fileservers have never seen it; virtualization never seen it; 
  webservers, never seen it)
* we've never seen this on any of our AFS fileservers using Linux + ext4,
  only with ZFSoL.

So it seems like something peculiar that the AFS fileserver is doing
that ZFSoL doesn't like. Though others have reported other applications
triggering it, we've never seen it with anything other than the AFS
fileserver.  Here's a link to the ZFS bug report that seems relevant:  
https://github.com/zfsonlinux/zfs/issues/8673

On a dafileserver, it will hang only volumes that trigger the bug.  On the
traditional fileserver, the whole thing hangs.

Here is a recent hang:

Jan 10 21:07:08 sloth kernel: PANIC: zfs: accessing past end of object 8/4a7559 (size=9216 access=4136+8848)
Jan 10 21:07:08 sloth kernel: Showing stack for process 10371
Jan 10 21:07:08 sloth kernel: CPU: 2 PID: 10371 Comm: dafileserver Kdump: loaded Tainted: P           OE  ------------   3.10.0-957.27.2.el7.x86_64 #1
Jan 10 21:07:08 sloth kernel: Hardware name: Supermicro X8DT6/X8DT6, BIOS 2.0b    08/30/2011
Jan 10 21:07:08 sloth kernel: Call Trace:
Jan 10 21:07:08 sloth kernel: [<ffffffffb8764147>] dump_stack+0x19/0x1b
Jan 10 21:07:08 sloth kernel: [<ffffffffc070b9db>] spl_dumpstack+0x2b/0x30 [spl]
Jan 10 21:07:08 sloth kernel: [<ffffffffc070bb5c>] vcmn_err+0x6c/0x110 [spl]
Jan 10 21:07:08 sloth kernel: [<ffffffffb8768192>] ? mutex_lock+0x12/0x2f
Jan 10 21:07:08 sloth kernel: [<ffffffffc10ee5ba>] ? dmu_zfetch+0x4ea/0x590 [zfs]
Jan 10 21:07:08 sloth kernel: [<ffffffffc10cb863>] ? dbuf_rele_and_unlock+0x283/0x5c0 [zfs]
Jan 10 21:07:08 sloth kernel: [<ffffffffb8768192>] ? mutex_lock+0x12/0x2f
Jan 10 21:07:08 sloth kernel: [<ffffffffc10c85f3>] ? dbuf_find+0x1e3/0x200 [zfs]
Jan 10 21:07:08 sloth kernel: [<ffffffffb8768192>] ? mutex_lock+0x12/0x2f
Jan 10 21:07:08 sloth kernel: [<ffffffffb821e911>] ? __kmalloc_node+0x1d1/0x2b0
Jan 10 21:07:08 sloth kernel: [<ffffffffc11483a9>] zfs_panic_recover+0x69/0x90 [zfs]
Jan 10 21:07:08 sloth kernel: [<ffffffffc10d72a7>] dmu_buf_hold_array_by_dnode+0x2d7/0x4a0 [zfs]
Jan 10 21:07:08 sloth kernel: [<ffffffffc110560a>] ? dsl_dir_tempreserve_space+0x1fa/0x4a0 [zfs]
Jan 10 21:07:08 sloth kernel: [<ffffffffc10d8c45>] dmu_write_uio_dnode+0x55/0x150 [zfs]
Jan 10 21:07:08 sloth kernel: [<ffffffffc10ed9ad>] ? dmu_tx_assign+0x20d/0x490 [zfs]
Jan 10 21:07:08 sloth kernel: [<ffffffffc10d8d94>] dmu_write_uio_dbuf+0x54/0x70 [zfs]
Jan 10 21:07:08 sloth kernel: [<ffffffffc11abe6c>] zfs_write+0xd3c/0xed0 [zfs]
Jan 10 21:07:08 sloth kernel: [<ffffffffc10c85f3>] ? dbuf_find+0x1e3/0x200 [zfs]
Jan 10 21:07:08 sloth kernel: [<ffffffffb8768192>] ? mutex_lock+0x12/0x2f
Jan 10 21:07:08 sloth kernel: [<ffffffffb8768192>] ? mutex_lock+0x12/0x2f
Jan 10 21:07:08 sloth kernel: [<ffffffffb80ddd9e>] ? account_entity_dequeue+0xae/0xd0
Jan 10 21:07:08 sloth kernel: [<ffffffffb802a621>] ? __switch_to+0x151/0x580
Jan 10 21:07:08 sloth kernel: [<ffffffffc11cb74e>] zpl_write_common_iovec.constprop.8+0x9e/0x100 [zfs]
Jan 10 21:07:08 sloth kernel: [<ffffffffc11cb8b4>] zpl_aio_write+0x104/0x120 [zfs]
Jan 10 21:07:08 sloth kernel: [<ffffffffb8241cdb>] do_sync_readv_writev+0x7b/0xd0
Jan 10 21:07:08 sloth kernel: [<ffffffffb824391e>] do_readv_writev+0xce/0x260
Jan 10 21:07:08 sloth kernel: [<ffffffffc11cb7b0>] ? zpl_write_common_iovec.constprop.8+0x100/0x100 [zfs]
Jan 10 21:07:08 sloth kernel: [<ffffffffb8241b80>] ? do_sync_read+0xe0/0xe0
Jan 10 21:07:08 sloth kernel: [<ffffffffb8243b45>] vfs_writev+0x35/0x60
Jan 10 21:07:08 sloth kernel: [<ffffffffb8243f42>] SyS_pwritev+0xc2/0xf0
Jan 10 21:07:08 sloth kernel: [<ffffffffb8776ddb>] system_call_fastpath+0x22/0x27
Jan 10 21:07:08 sloth kernel: [<ffffffffb8776d21>] ? system_call_after_swapgs+0xae/0x146

-- 
M. Casper Lewis                     |   mclewis@ucdavis.edu
Systems Administrator               |   Voice: (530) 754-7978
Genome Center                       |
University of California, Davis     |