OpenAFS Master Repository branch, openafs-stable-1_6_x, updated. openafs-stable-1_6_14_1-10-g49facf6
Gerrit Code Review
gerrit@openafs.org
Wed, 7 Oct 2015 06:19:38 -0400
The following commit has been merged in the openafs-stable-1_6_x branch:
commit 49facf65daeda716dc61ef56b95b4bec99d0f2c1
Author: Andrew Deason <adeason@sinenomine.net>
Date: Mon Oct 27 16:39:34 2014 -0500
rx: Reset lastSendData when resetting call
Currently we use call->lastSendData to attempt to detect a stalled
call, if it's been too long since the last time the call sent any
data. However, we never initialize lastSendData to anything when
creating a new call.
This means that when rx_NewCall (or rxi_NewCall) returns, lastSendData
can be nonzero. This can happen if we reuse a DALLY call, or if we
pull a call off of rx_freeCallQueue. This can be a time very far in
the past, since the lastSendData time has not changed since the last
time the call was used; it will remain unchanged until a user of the
new call writes something to the call stream.
This can be a problem between the time when a caller creates a new
call with rx_NewCall and when the caller actually writes something to
the stream. Between those two times, if lastSendData happens to be set
to a time in the past, we may call rxi_CheckCall on that call, and
abort the call for being idle. The call will thus be aborted before it
even sent any data on the wire.
This is of particular concern for multi_Rx calls, since those can
create a large number of call structures, possibly introducing a delay
between calling rx_NewCall and writing anything to the stream (if one
of the later rx_NewCall invocations blocks waiting for an open call
channel, for instance, all of the previous allocated calls will stick
around unused for potentially a long time).
One such multi_Rx call is done by the cache manager, where it
periodically uses multi_Rx to call RXAFS_GetCapabilities to probe
fileservers for reachability. If this issue occurs during that
operation you can see a large number of servers get marked down for
code -9 (RX_CALL_IDLE), and then get marked as coming back up.
To fix this, set lastSendData to 0 when resetting a call, along with
most of the other fields in a call, to indicate that the call has
never sent any data. As long as lastSendData is 0, the call will never
get aborted with RX_CALL_IDLE, and this situation will be avoided.
This ensures that this issue cannot happen, since rxi_ResetCall is
guaranteed to be called at some point whenever we reuse a call
structure for any reason.
Reviewed-on: http://gerrit.openafs.org/11557
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
(cherry picked from commit 8c78a44cf5197ceee6907e947074973138c442f0)
Change-Id: I1016de366bbd6d3d3cf542b42d7689b60dbacafe
Reviewed-on: http://gerrit.openafs.org/11594
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Daria Phoebe Brashear <shadow@your-file-system.com>
Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
src/rx/rx.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
--
OpenAFS Master Repository