OpenAFS Master Repository branch, openafs-stable-1_8_x, updated. openafs-stable-1_8_2-28-g8ca82f1

Gerrit Code Review gerrit@openafs.org
Fri, 25 Jan 2019 12:45:56 -0500


The following commit has been merged in the openafs-stable-1_8_x branch:
commit 8ca82f1252db2e3f17f6a9080f56d74035bdaa16
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Thu Jan 17 00:12:06 2019 -0600

    afs: Do not ignore errors in afs_CacheFetchProc
    
    afs_CacheFetchProc currently has a section of code that looks like
    this pseudocode:
    
        if (!code) do {
            while (length > 0) {
                code = read_from_rx();
                if (code) {
                    break;
                }
                code = write_to_cache();
                if (code) {
                    break;
                }
            }
            code = 0;
        } while (moredata);
        return code;
    
    When we encounter an error when reading from rx or writing to the
    cache, we break out of the current loop to stop processing and return
    an error. But there are _two_ loops in this section of the code, so
    what we actually do is break out of the inner loop, set 'code' to 0,
    and then usually return (since 'moredata' is usually never set).
    
    This means that when we encounter an unexpected error either from the
    net or disk (or the memcache layer), we ignore the error and return
    success. This means that we'll store a subset of the relevant chunk's
    data to disk, and flag that chunk as complete and valid for the
    relevant DV. If the error occurred before we wrote anything to disk,
    this means we'll store an empty chunk and flag it as valid. The chunk
    will be flagged as valid forever, serving invalid data, until the
    cache chunk is evicted or manually kicked out. This can result in
    files and directories appearing blank or truncated to applications
    until the bad chunk is removed.
    
    Possibly the most common way to encounter this issue is when using a
    disk cache, and the underlying disk partition is full, resulting in an
    unexpected ENOSPC error. Theoretically this can be seen from an
    unexpected error from Rx, but we would have to see a short read from
    Rx without the Rx call being aborted. If the call was aborted, we'd
    get an error from the call to rx_EndCall() later on.
    
    To fix this, change all of these 'break's into 'goto done's, to be
    more explicit about where we are jumping to. Convert all of the
    'break's in this function in the same way, to make the code flow more
    consistent and easier to follow. Remove the 'if () do' on a single
    line, since it makes it a little harder to see from a casual glance
    that there are two nested loops here.
    
    This problem appears to have been introduced in commit 61ae8792 (Unite
    CacheFetchProcs and add abstraction calls), included in OpenAFS
    1.5.62.
    
    Reviewed-on: https://gerrit.openafs.org/13428
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit ce327b568f4ff522aa008f235d97e0d9144eb92c)
    
    Change-Id: Id4ec8ffef38b4c86beffc6272bd283bce2c74ffe
    Reviewed-on: https://gerrit.openafs.org/13443
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

 src/afs/afs_fetchstore.c |   13 +++++++++----
 1 files changed, 9 insertions(+), 4 deletions(-)

-- 
OpenAFS Master Repository