OpenAFS Master Repository branch, openafs-stable-1_6_x, updated. openafs-stable-1_6_14_1-10-g49facf6

Gerrit Code Review gerrit@openafs.org
Wed, 7 Oct 2015 06:19:38 -0400


The following commit has been merged in the openafs-stable-1_6_x branch:
commit 49facf65daeda716dc61ef56b95b4bec99d0f2c1
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Mon Oct 27 16:39:34 2014 -0500

    rx: Reset lastSendData when resetting call
    
    Currently we use call->lastSendData to attempt to detect a stalled
    call, if it's been too long since the last time the call sent any
    data. However, we never initialize lastSendData to anything when
    creating a new call.
    
    This means that when rx_NewCall (or rxi_NewCall) returns, lastSendData
    can be nonzero. This can happen if we reuse a DALLY call, or if we
    pull a call off of rx_freeCallQueue. This can be a time very far in
    the past, since the lastSendData time has not changed since the last
    time the call was used; it will remain unchanged until a user of the
    new call writes something to the call stream.
    
    This can be a problem between the time when a caller creates a new
    call with rx_NewCall and when the caller actually writes something to
    the stream. Between those two times, if lastSendData happens to be set
    to a time in the past, we may call rxi_CheckCall on that call, and
    abort the call for being idle. The call will thus be aborted before it
    even sent any data on the wire.
    
    This is of particular concern for multi_Rx calls, since those can
    create a large number of call structures, possibly introducing a delay
    between calling rx_NewCall and writing anything to the stream (if one
    of the later rx_NewCall invocations blocks waiting for an open call
    channel, for instance, all of the previous allocated calls will stick
    around unused for potentially a long time).
    
    One such multi_Rx call is done by the cache manager, where it
    periodically uses multi_Rx to call RXAFS_GetCapabilities to probe
    fileservers for reachability. If this issue occurs during that
    operation you can see a large number of servers get marked down for
    code -9 (RX_CALL_IDLE), and then get marked as coming back up.
    
    To fix this, set lastSendData to 0 when resetting a call, along with
    most of the other fields in a call, to indicate that the call has
    never sent any data. As long as lastSendData is 0, the call will never
    get aborted with RX_CALL_IDLE, and this situation will be avoided.
    This ensures that this issue cannot happen, since rxi_ResetCall is
    guaranteed to be called at some point whenever we reuse a call
    structure for any reason.
    
    Reviewed-on: http://gerrit.openafs.org/11557
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
    (cherry picked from commit 8c78a44cf5197ceee6907e947074973138c442f0)
    
    Change-Id: I1016de366bbd6d3d3cf542b42d7689b60dbacafe
    Reviewed-on: http://gerrit.openafs.org/11594
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Daria Phoebe Brashear <shadow@your-file-system.com>
    Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

 src/rx/rx.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

-- 
OpenAFS Master Repository