OpenAFS Master Repository branch, openafs-stable-1_8_x, updated. openafs-stable-1_8_1pre2-7-g6dd052c

Fri, 27 Jul 2018 10:15:09 -0400

The following commit has been merged in the openafs-stable-1_8_x branch:
commit 6dd052cbab09c95e97c910421cfaf68713e906c5
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Thu Apr 26 12:27:12 2018 -0500

    afs: Stop looking for dcaches on Get*DSlot errors
    
    In various places in the code, we'll be looking for a dslot, calling
    afs_GetValidDSlot (or afs_GetUnusedDSlot) in a loop. In a few places,
    we currently keep looking for the dslot when we get an error back,
    since afs_GetValidDSlot may return successfully for other slots, and
    we might find the dslot we're looking for.
    
    This behavior was introduced in a few commits, including:
    
    - commit 2679af76 (afs: Traverse discard/free dslot list if errors)
    - commit 00fd34a6 (afs: Handle easy GetValidDSlot errors)
    - commit 9a558660 (afs: Cope with afs_GetValidDSlot errors)
    
    This behavior means that if afs_GetValidDSlot/afs_GetUnusedDSlot
    returns an error for a particular dcache slot, but other slots are
    okay, then we may still find the dcache we're looking for.
    
    However, by far the most common reason that
    afs_GetValidDSlot/afs_GetUnusedDSlot fails is because our disk cache
    is completely unusable; it is very rare that only a few slots cannot
    be used, but others are fine (this would mean that the disk cache was
    corrupted in oddly specific ways, or there are small isolated errors
    in the underlying disk). So continuing the dcache search in these
    situations is not very useful.
    
    On Linux, this is most commonly seen by the underlying disk cache i/o
    calls returning -EINTR, which can happen if a SIGKILL signal is
    pending for the current process when we try to do the i/o. In this
    situation, all attempts to read in a dslot from disk will fail; trying
    other slots or waiting will not improve the situation. Depending on
    which specific code path encounters an afs_Get*DSlot error, we can
    then flood the log with "disk cache read error in CacheItems" messages
    emitted from afs_UFSGetDSlot, since we keep calling afs_Get*DSlot in
    our loop.
    
    The worst offender of this is usually afs_GetDSlotFromList via
    afs_AllocDCache, since we end up calling afs_GetUnusedDSlot for every
    single dslot in the free and discard lists. However, our other call
    sites that are looking for dcaches for a specific file can still
    generate quite a few of these messages, since we'll end up calling
    afs_GetValidDSlot for every slot in a dcache hash chain.
    
    So to avoid flooding the log in these situations, change most callers
    of afs_GetValidDSlot and afs_GetUnusedDSlot to stop on the first
    error, and act like we never found a dcache that we were looking for.
    
    This commit also adjusts one caller in afs_ProcessOpCreate, which was
    not handling errors from afs_GetValidDSlot at all, and changes
    FlushVolumeData to be able to return error codes.
    
    Reviewed-on: https://gerrit.openafs.org/13034
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    (cherry picked from commit 12f4fd2901fee8bf27c2cec97efd3d242c6ff025)
    
    Change-Id: I2a9865e510be39d1b5bcb9280419630036c00bef
    Reviewed-on: https://gerrit.openafs.org/13191
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Joe Gorse <jhgorse@gmail.com>
    Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

 src/afs/afs_dcache.c       |   35 +++++++++++++++++++----------------
 src/afs/afs_disconnected.c |   24 +++++++++++++++---------
 src/afs/afs_pioctl.c       |   13 +++++++------
 src/afs/afs_segments.c     |    2 +-
 4 files changed, 42 insertions(+), 32 deletions(-)

-- 
OpenAFS Master Repository