OpenAFS Master Repository branch, master, updated. BP-openafs-stable-1_8_x-608-g20dc283

Gerrit Code Review gerrit@openafs.org
Fri, 29 May 2020 00:23:15 -0400


The following commit has been merged in the master branch:
commit 20dc2832268eb81d40e798da0d424c98cf26062c
Author: Andrew Deason <adeason@dson.org>
Date:   Sun Nov 24 22:36:17 2019 -0600

    FBSD: Ignore VI_DOOMED vnodes
    
    Currently on FreeBSD, osi_TryEvictVCache calls vgone() for our vnode
    after checking if the given vcache is in use. vgone() then calls our
    VOP_RECLAIM operation, which calls afs_vop_reclaim, which calls
    afs_FlushVCache to finally actually flush the vcache.
    
    The current approach has at least the following major issues:
    
    - In afs_vop_reclaim, we return success even if afs_FlushVCache()
      fails. This allows FreeBSD to reuse the vnode for another file, but
      the vnode is still being referenced by our vcache, which is
      referenced by the global VLRU and various other structures. This
      causes all kinds of weird errors, since we try to use the underlying
      vnode for different files.
    
    - After the relevant checks in osi_TryEvictVCache are done, another
      thread can acquire a new reference to our vcache (this can happen
      while vgone() is running up until the vnode is locked). This new
      reference will cause afs_FlushVCache to fail.
    
    - Our afs_vop_reclaim callback is called while the vnode is locked,
      and can acquire afs_xvcache. Other code locks the vnode while
      afs_xvcache is already held (such as afs_PutVCache -> vrele). This
      can lead to deadlocks if two threads try to run these codepaths for
      the same vnode at the same time.
    
    - afs_vop_reclaim optionally acquires afs_xvcache based on the return
      value of CheckLock(&afs_xvcache). However, CheckLock just returns if
      that lock is locked by anyone, not if the current thread holds the
      lock. This can result in the rest of the function running without
      afs_xvcache actually being held if we drop AFS_GLOCK at any point.
    
    - osi_TryEvictVCache() tries to vn_lock() the target vnode, but we may
      already have another vnode locked in the current thread. If the
      vnode we're trying to evict is a descendant of a vnode we already
      have locked, this can deadlock.
    
    To fix these issues, make some changes to how our vcache management
    works on FreeBSD:
    
    - Do not allow anyone to hold a new reference on a VI_DOOMED vnode.
      We do this by checking for VI_DOOMED in osi_vnhold, and returning an
      error if VI_DOOMED is set.
    
    - In afs_vop_reclaim, panic if afs_FlushVCache fails. With the new
      VI_DOOMED check, afs_FlushVCache show now never fail; and if it
      somehow does, panic'ing immediately is better than corrupting
      various structures and panic'ing later on.
    
    - Move around some of the relevant locking in afs_vop_reclaim to fix
      the lock-related issues.
    
    - In osi_TryEvictVCache, don't wait for the vnode lock (LK_NOWAIT);
      treat the vnode as "in use" if we can't immediately obtain the lock.
    
    Thanks to tcreech@tcreech.com and kaduk@mit.edu for insight and help
    investigating the relevant issues.
    
    FIXES 135041
    
    Change-Id: I23e94ecebbddc8c68a8f4ea918d64efd0f9f9dfd
    Reviewed-on: https://gerrit.openafs.org/13972
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

 src/afs/FBSD/osi_vcache.c   |   42 +++++++++++++++++++++++++++++++-----------
 src/afs/FBSD/osi_vm.c       |    3 +--
 src/afs/FBSD/osi_vnodeops.c |   28 ++++++++++++++++++++--------
 3 files changed, 52 insertions(+), 21 deletions(-)

-- 
OpenAFS Master Repository