OpenAFS Master Repository branch, openafs-stable-1_6_x, updated. openafs-stable-1_6_1pre2-70-ga7bcd00
Gerrit Code Review
gerrit@openafs.org
Sun, 26 Feb 2012 07:08:46 -0800 (PST)
The following commit has been merged in the openafs-stable-1_6_x branch:
commit 68dc637db6d99a48d7be0556916a8cc084843286
Author: Jeffrey Altman <jaltman@your-file-system.com>
Date: Fri Nov 25 09:28:18 2011 -0500
Windows: improved idle dead time handling
RX_CALL_IDLE has been treated the same as RX_CALL_DEAD which is
a fatal error that results in the server being marked down. This
is not the appropriate behavior for an idle dead timeout error
which should not result in servers being marked down.
Idle dead timeouts are locally generated and are an indication
that the server:
a. is severely overloaded and cannot process all
incoming requests in a timely fashion.
b. has a partition whose underlying disk (or iSCSI, etc) is
failing and all I/O requests on that device are blocking.
c. has a large number of threads blocking on a single vnode
and cannot process requests for other vnodes as a result.
d. is malicious.
RX_CALL_IDLE is distinct from RX_DEAD_CALL in that idle dead timeout
handling should permit failover to replicas when they exist in a
timely fashion but in the non-replica case should not be triggered
until the hard dead timeout. If the request cannot be retried, it
should fail with an I/O error. The client should not retry a request
to the same server as a result of an idle dead timeout.
In addition, RX_CALL_IDLE indicates that the client has abandoned
the call but the server has not. Therefore, the client cannot determine
whether or not the RPC will eventually succeed and it must discard
any status information it has about the object of the RPC if the
RPC could have altered the object state upon success.
This patchset splits the RX_CALL_DEAD processing in cm_Analyze() to
clarify that only RX_CALL_DEAD errors result in the server being marked
down. Since Rx idle dead timeout processing is per connection and
idle dead timeouts must differ depending upon whether or not replica
sites exist, cm_ConnBy*() are extended to select a connection based
upon whether or not replica sites exist. A separate connection object
is used for RPCs to replicated objects as compared to RPCs to non-replicated
objects (volumes or vldb).
For non-replica connections the idle dead timeout is set to the hard
dead timeout. For replica connections the idle dead timeout is set
to the configured idle dead timeout.
Idle dead timeout events and whether or not a retry was triggered
are logged to the Windows Event Log.
cm_Analyze() is given a new 'storeOp' parameter which is non-zero
when the execute RPC could modify the data on the file server.
Reviewed-on: http://gerrit.openafs.org/6118
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit f768fb95f3eb3815d6225e074c43341ed2ad5347)
Change-Id: If7194292be0fc2350af9f26c397bd3a1e840abdc
Reviewed-on: http://gerrit.openafs.org/6830
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
src/WINNT/afsd/afsd_eventlog.c | 1 +
src/WINNT/afsd/afsd_eventmessages.mc | 8 +
src/WINNT/afsd/cm_callback.c | 6 +-
src/WINNT/afsd/cm_conn.c | 270 +++++++++++++++++++++++++++-------
src/WINNT/afsd/cm_conn.h | 19 ++-
src/WINNT/afsd/cm_dcache.c | 8 +-
src/WINNT/afsd/cm_ioctl.c | 8 +-
src/WINNT/afsd/cm_server.c | 6 +-
src/WINNT/afsd/cm_utils.c | 3 +-
src/WINNT/afsd/cm_vnodeops.c | 27 ++--
src/WINNT/afsd/cm_volume.c | 18 ++-
src/WINNT/afsd/cm_volume.h | 2 +
12 files changed, 284 insertions(+), 92 deletions(-)
--
OpenAFS Master Repository