[OpenAFS] Panic on FreeBSD 10.3

Benjamin Kaduk kaduk@mit.edu
Tue, 5 Dec 2017 14:47:34 -0600


On Mon, Dec 04, 2017 at 11:24:40AM -0500, Michael H Lambert wrote:
> We've been seeing kernel panics like the following on FreeBSD:
> 
> ----------
> vputx: negative ref count
> 0xfffff800aef07588: tag afs, type VDIR
>     usecount 0, writecount 0, refcount 11 mountedhere 0
>     flags (VV_ROOT|VI_ACTIVE)
>  VI_LOCKed    v_object 0xfffff8006e4b3100 ref 0 pages 0 cleanbuf 0 dirtybuf 0
>     lock type afs: EXCL by thread 0xfffff800ae706960 (pid 43045, httpd, tid 100136)
>  with exclusive waiters pending
> vc 0xfffffe0001fe9000 vp 0xfffff800aef07588 tag afs, fid: 0.1.1.1, opens 0, writers 0
>   states readonly
> panic: vputx: negative ref cnt
> cpuid = 1
> KDB: stack backtrace:
> #0 0xffffffff8098ead0 at kdb_backtrace+0x60
> #1 0xffffffff809517a6 at vpanic+0x126
> #2 0xffffffff80951673 at panic+0x43
> #3 0xffffffff809f80d5 at vputx+0x2d5
> #4 0xfffffe0001ca6a0a at afs_PutVCache+0x8a
> #5 0xfffffe0001cf1a80 at afs_root+0xc0
> #6 0xffffffff809ed663 at lookup+0x823
> #7 0xffffffff809ecb44 at namei+0x4d4
> #8 0xffffffff80a0625d at vn_open_cred+0x24d
> #9 0xffffffff809ff53f at kern_openat+0x26f
> #10 0xffffffff80d5722f at amd64_syscall+0x40f
> #11 0xffffffff80d3c48b at Xfast_syscall+0xfb
> ----------
> 
> % uname -a
> FreeBSD www.psc.edu 10.3-RELEASE-p24 FreeBSD 10.3-RELEASE-p24 #0: Wed Nov 15 04:57:40 UTC 2017     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
> 
> The OpenAFS client was built from source (/usr/ports/net/openafs; openafs-1.6.20.1-src.tar.bz2) using system sources updated for the running kernel.
> 
> I don't know exactly what was going on at the time of the panic(s), but it's somewhat likely that access to one or more very large files was involved.  Any thoughts on debugging this problem based on the kernel stack trace?

There should be enough to go on, since the vnode with negative
refcount is a/the AFS root vnode, and we're in the afs_root()
function and there's only three calls to afs_PutVCache() there to
consider.

It's interesting that it only triggers rarely, I suppose.  Anyway,
I'll take a closer look.

Thanks for the report,

Ben