OpenAFS Master Repository branch, master, updated. openafs-devel-1_5_76-4213-ga10696d

Gerrit Code Review gerrit@openafs.org
Mon, 14 Apr 2014 10:36:49 -0700 (PDT)


The following commit has been merged in the master branch:
commit a10696d9408b0ac21f8a011fce0a6a72b1c0fe0e
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Thu Jan 30 13:50:11 2014 -0600

    Fix rx_EndCall error precedence
    
    Callers of rx_EndCall in various parts of the code handle errors a bit
    differently from each other. The correct way to use rx_EndCall is
    almost always some form of:
    
        code = rx_EndCall(call, code);
    
    This will cause the call to abort with 'code' if the call is not
    already aborted, and will return the abort code for the call (or 0 if
    the call ended successfully). It is thus impossible for 'code' to
    start out with a non-zero value in the code snippet above, and end up
    with a value of 0 after the code snippet.
    
    Most code follows this pattern, because this is how the
    rxgen-generated client RPC wrappers are written. So for any non-split
    Rx call, this is how the error precedence works.
    
    However, some code (mostly for Rx split calls), needs to handle
    calling rx_EndCall itself, and some code appears to think it is
    possible for rx_EndCall to return 0 when we already had a non-zero
    error. Such code tries to ensure that we don't ignore an error we
    already got by doing something like this:
    
        code2 = rx_EndCall(call, code);
        if (code2 && !code) {
            code = code2;
        }
    
    However, this is not correct. If a call gets killed with an abort code
    partway through executing an RPC, and the client tries to end the RPC
    with e.g. EndRXAFS_FetchData, the client will get an error code of
    -451 (RXGEN_CC_UNMARSHAL). The actual error code is in the abort code
    for the call, but with the above 'code2' snippet, we can easily return
    an error of -451 instead, which will usually get interpreted as some
    unknown network-related error.
    
    This can manifest as a problem in the unix client, where if a
    FetchData call fails due to, for example, an "idle dead" timeout, we
    should result with an error code of RX_CALL_TIMEOUT. But because of
    the above issue, we'll instead yield an error of -451, causing the
    server to be marked down with the following message:
    
        afs: Lost contact with file server ... (code -451) ...
    
    So, fix most rx_EndCall callers to follow the 'code = rx_EndCall(call,
    code);' pattern. Not all of the changes here are to "wrong" code, but
    try to make all of the rx_EndCall call sites look more consistent.
    There are a few exceptions to this pattern, which warrant some
    variations:
    
     - A few instances in src/WINNT/afsd/cm_dcache.c do seem to want to
       record the original error before we ran rx_EndCall, instead of
       seeing the rx abort code. We still return the rx_EndCall-returned
       value to the caller, though.
    
     - Any caller of RXAFS_FetchData* needs to read a 'length' raw from
       the rx split stream. If this fails, we need to abort the call, but
       we don't really have an error code to give to rx_EndCall. Failure
       to read a length indicates that the server is not following
       protocol properly, so give rx_EndCall RX_PROTOCOL_ERROR in these
       instances. The call should already be aborted by this point, so
       most of the time this code will be ignored; it will only make a
       difference if the server tries to end the call successfully without
       sending a length, which is indeed a protocol error.
    
     - Some Rx clients can encounter a local error they don't want to send
       to the server via an abort, so they just end the call successfully,
       and only use the rx abort code if they don't already have a local
       error. This is in a few places like src/butc/dump.c and
       src/volser/vsprocs.c.
    
     - Several places don't care what the error from rx_EndCall is, such
       as various call sites in server-side code.
    
    The behavior of the Windows client w.r.t rx_EndCall was changed a bit
    into its current behavior in commit
    a50fa631cad6919d15721ac2c234ebbdda2b4031 (ticket 125018), which just
    appears to be wrong. This was partially reverted by commit
    ae7ef5f5b963a5c8ce4110a7352e0010cb6cdbc1 (ticket 125351), but some of
    the other call sites were unchanged. The Unix client appears to have
    been doing this incorrectly for at least FetchData calls since OpenAFS
    1.0.
    
    To make it hopefully more clear that rx_EndCall cannot return 0 if
    given a non-zero error code, add an assert to rx_EndCall that asserts
    that fact.
    
    FIXES 127001
    
    Change-Id: I10bbfe82b55b509e1930abb6c568edb1efd9fd2f
    Reviewed-on: http://gerrit.openafs.org/10788
    Reviewed-by: D Brashear <shadow@your-file-system.com>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>

 src/WINNT/afsd/cm_dcache.c      |   62 ++++++++++++++++++++-------------------
 src/WINNT/afsd/cm_direct.c      |   10 ++----
 src/afs/afs_bypasscache.c       |    3 +-
 src/afs/afs_fetchstore.c        |   37 +++++++----------------
 src/bozo/bos.c                  |    4 +-
 src/libadmin/bos/afs_bosAdmin.c |    4 +-
 src/libadmin/vos/vsprocs.c      |   11 ++----
 src/libafscp/afscp_file.c       |   16 +++++-----
 src/rx/rx.c                     |    4 ++
 src/ubik/recovery.c             |    6 +--
 src/volser/vsprocs.c            |   13 ++++----
 11 files changed, 75 insertions(+), 95 deletions(-)

-- 
OpenAFS Master Repository