[OpenAFS] More on FreeBSD 11.1

Benjamin Kaduk kaduk@mit.edu
Mon, 14 May 2018 12:47:17 -0500


On Mon, May 14, 2018 at 01:41:09PM -0400, Michael H Lambert wrote:
> I decided to try something more useful for debugging and built OpenAFS 1.8.0 with "--enable-debug-kernel --disable-optimize-kernel --disable-strip-binaries --enable-debug --disable-optimize".  This is on FreeBSD 11.1 with the fix for afs_vcache.c.  The kernel panic is definitely not limited to stopping afsd.  I have seen the panic when unpacking a tar file in an afs directory.  Below is some kgdb output which might help.  I can provide more if someone tells me what to look for.  I can also try rebuilding the system with an actual partition for /var/openafs/cache, rather than using a ZFS block device, if anyone thinks that might make a difference.

This looks awfully similar to what
54e84a98f9747bb5bb2ad4b8031115ad7684c914 is trying to fix, so I'd
ask you to double-check that you are in fact running with an
installed kernel module that includes the cherry-pick, as the first
step.

Barring that...

> Thanks,
> 
> Michael
> 
> -----
> # kgdb /boot/kernel/kernel vmcore.last
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 56194 (bsdtar)
> trap number             = 12
> panic: page fault
> cpuid = 3
> KDB: stack backtrace:
> #0 0xffffffff80aaf087 at kdb_backtrace+0x67
> #1 0xffffffff80a6d166 at vpanic+0x186
> #2 0xffffffff80a6cfd3 at panic+0x43
> #3 0xffffffff80eef192 at trap_fatal+0x322
> #4 0xffffffff80eef1eb at trap_pfault+0x4b
> #5 0xffffffff80eee948 at trap+0x2a8
> #6 0xffffffff80ed03e0 at calltrap+0x8
> #7 0xffffffff826a685c at afs_FlushVCache+0x38c
> #8 0xffffffff82733818 at afs_vop_reclaim+0x138
> #9 0xffffffff8105e219 at VOP_RECLAIM_APV+0x89
> #10 0xffffffff80b2b7fc at vgonel+0x21c
> #11 0xffffffff80b2bd70 at vgone+0x40
> #12 0xffffffff82730fc4 at osi_TryEvictVCache+0x1b4
> #13 0xffffffff826a86fe at afs_ShakeLooseVCaches+0x23e
> #14 0xffffffff826a8a39 at afs_NewVCache_int+0x49
> #15 0xffffffff826a89e4 at afs_NewVCache+0x24
> #16 0xffffffff826aa11d at afs_GetVCache+0x24d
> #17 0xffffffff826b6514 at afs_mkdir+0x12f4
> Uptime: 1h30m33s
> Dumping 1534 out of 32727 MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> 
> Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//
> boot/kernel/zfs.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/zfs.ko
> Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /usr/lib
> /debug//boot/kernel/opensolaris.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/opensolaris.ko
> Reading symbols from /boot/kernel/uhid.ko...Reading symbols from /usr/lib/debug/
> /boot/kernel/uhid.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/uhid.ko
> Reading symbols from /boot/kernel/pflog.ko...Reading symbols from /usr/lib/debug
> //boot/kernel/pflog.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/pflog.ko
> Reading symbols from /boot/kernel/pf.ko...Reading symbols from /usr/lib/debug//b
> oot/kernel/pf.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/pf.ko
> Reading symbols from /boot/modules/libafs.ko...Reading symbols from /usr/lib/deb
> ug//boot/modules/libafs.ko.debug...done.
> done.
> Loaded symbols for /boot/modules/libafs.ko
> #0  doadump (textdump=<value optimized out>) at pcpu.h:229
> 229     pcpu.h: No such file or directory.
>         in pcpu.h
> 
> 
> (kgdb) where
> #0  doadump (textdump=<value optimized out>) at pcpu.h:229
> #1  0xffffffff80a6cce1 in kern_reboot (howto=260)
>     at /usr/src/sys/kern/kern_shutdown.c:366
> #2  0xffffffff80a6d1a0 in vpanic (fmt=<value optimized out>, 
>     ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759
> #3  0xffffffff80a6cfd3 in panic (fmt=<value optimized out>)
>     at /usr/src/sys/kern/kern_shutdown.c:690
> #4  0xffffffff80eef192 in trap_fatal (frame=0xfffffe085e480f10, eva=468)
>     at /usr/src/sys/amd64/amd64/trap.c:878
> #5  0xffffffff80eef1eb in trap_pfault (frame=0xfffffe085e480f10, usermode=0)
>     at pcpu.h:229
> #6  0xffffffff80eee948 in trap (frame=0xfffffe085e480f10)
>     at /usr/src/sys/amd64/amd64/trap.c:422
> #7  0xffffffff80ed03e0 in calltrap ()
>     at /usr/src/sys/amd64/amd64/exception.S:231
> #8  0xffffffff826a6b93 in afs_StaleVCacheFlags (avc=0xfffffe003e0161d0, 
>     flags=2, cflags=4097)

... in kgdb, you could do 'frame 8' and then:

p avc
p *avc
p avc->v

-Ben

>     at /home/pscnoc/openafs/openafs-1.8.0/src/afs/afs_vcache.c:3256
> #9  0xffffffff826a685c in afs_FlushVCache (avc=0xfffffe003e0161d0, 
>     slept=0xfffffe085e4810c0)
>     at /home/pscnoc/openafs/openafs-1.8.0/src/afs/afs_vcache.c:251
> #10 0xffffffff82733818 in afs_vop_reclaim (ap=0xfffffe085e481118)
>     at /home/pscnoc/openafs/openafs-1.8.0/src/afs/FBSD/osi_vnodeops.c:1505
> #11 0xffffffff8105e219 in VOP_RECLAIM_APV (vop=<value optimized out>, 
>     a=0xfffffe085e481118) at vnode_if.c:2021
> #12 0xffffffff80b2b7fc in vgonel (vp=0xfffff80449c0b588) at vnode_if.h:830
> #13 0xffffffff80b2bd70 in vgone (vp=0xfffff80449c0b588)
>     at /usr/src/sys/kern/vfs_subr.c:3134
> #14 0xffffffff82730fc4 in osi_TryEvictVCache (avc=0xfffffe003e0161d0, 
>     slept=0xfffffe085e481264, defersleep=0)
>     at /home/pscnoc/openafs/openafs-1.8.0/src/afs/FBSD/osi_vcache.c:47
> #15 0xffffffff826a86fe in afs_ShakeLooseVCaches (anumber=5)
>     at /home/pscnoc/openafs/openafs-1.8.0/src/afs/afs_vcache.c:776
> #16 0xffffffff826a8a39 in afs_NewVCache_int (afid=0xfffffe085e481748, 
>     serverp=0x0, seq=0)
>     at /home/pscnoc/openafs/openafs-1.8.0/src/afs/afs_vcache.c:955
> #17 0xffffffff826a89e4 in afs_NewVCache (afid=0xfffffe085e481748, serverp=0x0)
>     at /home/pscnoc/openafs/openafs-1.8.0/src/afs/afs_vcache.c:1028
> #18 0xffffffff826aa11d in afs_GetVCache (afid=0xfffffe085e481748, 
>     areq=0xfffff8002f944800, cached=0x0, avc=0x0)
>     at /home/pscnoc/openafs/openafs-1.8.0/src/afs/afs_vcache.c:1707
> #19 0xffffffff826b6514 in afs_mkdir (adp=0xfffffe003e148500, 
>     aname=0xfffff805280be2a0 "fe", attrs=0xfffffe085e481860, 
>     avcp=0xfffffe085e4817f0, acred=0xfffff8002fb74c00)
>     at /home/pscnoc/openafs/openafs-1.8.0/src/afs/VNOPS/afs_vnop_dirops.c:231
> #20 0xffffffff827330b3 in afs_vop_mkdir (ap=0xfffffe085e4819f0)
>     at /home/pscnoc/openafs/openafs-1.8.0/src/afs/FBSD/osi_vnodeops.c:1329
> #21 0xffffffff8105db79 in VOP_MKDIR_APV (vop=<value optimized out>, 
>     a=0xfffffe085e4819f0) at vnode_if.c:1610
> #22 0xffffffff80b36a12 in kern_mkdirat (td=0xfffff80024b1b560, fd=-100, 
>     path=0x80221b100 <Address 0x80221b100 out of bounds>, 
>     segflg=UIO_USERSPACE, mode=<value optimized out>) at vnode_if.h:665
> #23 0xffffffff80eefd74 in amd64_syscall (td=0xfffff80024b1b560, traced=0)
>     at subr_syscall.c:135
> #24 0xffffffff80ed0c12 in fast_syscall_common ()
>     at /usr/src/sys/amd64/amd64/exception.S:464
> #25 0x000000080221b100 in ?? ()
> #26 0x00000000000001ed in ?? ()
> #27 0x8080808080808080 in ?? ()
> #28 0x00000008008d4218 in ?? ()
> #29 0xfefefefefefefeff in ?? ()
> #30 0x8080808080808080 in ?? ()
> #31 0x0000000000000088 in ?? ()
> #32 0x00000000000001ed in ?? ()
> #33 0x00007fffffffe7a0 in ?? ()
> #34 0x0000000802274040 in ?? ()
> #35 0x0000000802222000 in ?? ()
> #36 0x00000000000001ed in ?? ()
> #37 0x000000000060f800 in ?? ()
> #38 0x0000000802222000 in ?? ()
> #39 0x0000000802216500 in ?? ()
> #40 0x001b001300000000 in ?? ()
> #41 0x003b003b00000001 in ?? ()
> #42 0x003b003b00000001 in ?? ()
> #43 0x0000000000000002 in ?? ()
> #44 0x0000000800c42d9a in ?? ()
> #45 0x0000000000000043 in ?? ()
> #46 0x0000000000000206 in ?? ()
> #47 0x00007fffffffe6c8 in ?? ()
> #48 0x000000000000003b in ?? ()
> #49 0xffffffff81d1e948 in sleepq_chains ()
> #50 0x0000000000000003 in ?? ()
> #51 0xfffff80024b1b560 in ?? ()
> #52 0xffffffff81d1e948 in sleepq_chains ()
> #53 0xfffffe085e4810a0 in ?? ()
> #54 0xfffffe085e481048 in ?? ()
> #55 0xfffff8000a3a0560 in ?? ()
> #56 0xffffffff80a98dfa in sched_switch (td=0x60f800, newtd=0x1ed, 
>     flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1982
> Previous frame inner to this frame (corrupt stack?)
> Current language:  auto; currently minimal
> 
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info