OpenAFS Master Repository branch, master, updated. BP-openafs-stable-1_8_x-608-g20dc283
Gerrit Code Review
Fri, 29 May 2020 00:23:15 -0400
The following commit has been merged in the master branch:
Author: Andrew Deason <email@example.com>
Date: Sun Nov 24 22:36:17 2019 -0600
FBSD: Ignore VI_DOOMED vnodes
Currently on FreeBSD, osi_TryEvictVCache calls vgone() for our vnode
after checking if the given vcache is in use. vgone() then calls our
VOP_RECLAIM operation, which calls afs_vop_reclaim, which calls
afs_FlushVCache to finally actually flush the vcache.
The current approach has at least the following major issues:
- In afs_vop_reclaim, we return success even if afs_FlushVCache()
fails. This allows FreeBSD to reuse the vnode for another file, but
the vnode is still being referenced by our vcache, which is
referenced by the global VLRU and various other structures. This
causes all kinds of weird errors, since we try to use the underlying
vnode for different files.
- After the relevant checks in osi_TryEvictVCache are done, another
thread can acquire a new reference to our vcache (this can happen
while vgone() is running up until the vnode is locked). This new
reference will cause afs_FlushVCache to fail.
- Our afs_vop_reclaim callback is called while the vnode is locked,
and can acquire afs_xvcache. Other code locks the vnode while
afs_xvcache is already held (such as afs_PutVCache -> vrele). This
can lead to deadlocks if two threads try to run these codepaths for
the same vnode at the same time.
- afs_vop_reclaim optionally acquires afs_xvcache based on the return
value of CheckLock(&afs_xvcache). However, CheckLock just returns if
that lock is locked by anyone, not if the current thread holds the
lock. This can result in the rest of the function running without
afs_xvcache actually being held if we drop AFS_GLOCK at any point.
- osi_TryEvictVCache() tries to vn_lock() the target vnode, but we may
already have another vnode locked in the current thread. If the
vnode we're trying to evict is a descendant of a vnode we already
have locked, this can deadlock.
To fix these issues, make some changes to how our vcache management
works on FreeBSD:
- Do not allow anyone to hold a new reference on a VI_DOOMED vnode.
We do this by checking for VI_DOOMED in osi_vnhold, and returning an
error if VI_DOOMED is set.
- In afs_vop_reclaim, panic if afs_FlushVCache fails. With the new
VI_DOOMED check, afs_FlushVCache show now never fail; and if it
somehow does, panic'ing immediately is better than corrupting
various structures and panic'ing later on.
- Move around some of the relevant locking in afs_vop_reclaim to fix
the lock-related issues.
- In osi_TryEvictVCache, don't wait for the vnode lock (LK_NOWAIT);
treat the vnode as "in use" if we can't immediately obtain the lock.
Thanks to firstname.lastname@example.org and email@example.com for insight and help
investigating the relevant issues.
Tested-by: BuildBot <firstname.lastname@example.org>
Reviewed-by: Benjamin Kaduk <email@example.com>
src/afs/FBSD/osi_vcache.c | 42 +++++++++++++++++++++++++++++++-----------
src/afs/FBSD/osi_vm.c | 3 +--
src/afs/FBSD/osi_vnodeops.c | 28 ++++++++++++++++++++--------
3 files changed, 52 insertions(+), 21 deletions(-)
OpenAFS Master Repository