OpenAFS Master Repository branch, openafs-stable-1_6_x, updated. openafs-stable-1_6_1pre2-70-ga7bcd00

Gerrit Code Review gerrit@openafs.org
Sun, 26 Feb 2012 07:08:46 -0800 (PST)


The following commit has been merged in the openafs-stable-1_6_x branch:
commit 68dc637db6d99a48d7be0556916a8cc084843286
Author: Jeffrey Altman <jaltman@your-file-system.com>
Date:   Fri Nov 25 09:28:18 2011 -0500

    Windows: improved idle dead time handling
    
    RX_CALL_IDLE has been treated the same as RX_CALL_DEAD which is
    a fatal error that results in the server being marked down.  This
    is not the appropriate behavior for an idle dead timeout error
    which should not result in servers being marked down.
    
    Idle dead timeouts are locally generated and are an indication
    that the server:
    
     a. is severely overloaded and cannot process all
        incoming requests in a timely fashion.
    
     b. has a partition whose underlying disk (or iSCSI, etc) is
        failing and all I/O requests on that device are blocking.
    
     c. has a large number of threads blocking on a single vnode
        and cannot process requests for other vnodes as a result.
    
     d. is malicious.
    
    RX_CALL_IDLE is distinct from RX_DEAD_CALL in that idle dead timeout
    handling should permit failover to replicas when they exist in a
    timely fashion but in the non-replica case should not be triggered
    until the hard dead timeout.  If the request cannot be retried, it
    should fail with an I/O error.  The client should not retry a request
    to the same server as a result of an idle dead timeout.
    
    In addition, RX_CALL_IDLE indicates that the client has abandoned
    the call but the server has not.  Therefore, the client cannot determine
    whether or not the RPC will eventually succeed and it must discard
    any status information it has about the object of the RPC if the
    RPC could have altered the object state upon success.
    
    This patchset splits the RX_CALL_DEAD processing in cm_Analyze() to
    clarify that only RX_CALL_DEAD errors result in the server being marked
    down.  Since Rx idle dead timeout processing is per connection and
    idle dead timeouts must differ depending upon whether or not replica
    sites exist, cm_ConnBy*() are extended to select a connection based
    upon whether or not replica sites exist.  A separate connection object
    is used for RPCs to replicated objects as compared to RPCs to non-replicated
    objects (volumes or vldb).
    
    For non-replica connections the idle dead timeout is set to the hard
    dead timeout.  For replica connections the idle dead timeout is set
    to the configured idle dead timeout.
    
    Idle dead timeout events and whether or not a retry was triggered
    are logged to the Windows Event Log.
    
    cm_Analyze() is given a new 'storeOp' parameter which is non-zero
    when the execute RPC could modify the data on the file server.
    
    Reviewed-on: http://gerrit.openafs.org/6118
    Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    (cherry picked from commit f768fb95f3eb3815d6225e074c43341ed2ad5347)
    
    Change-Id: If7194292be0fc2350af9f26c397bd3a1e840abdc
    Reviewed-on: http://gerrit.openafs.org/6830
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
    Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>

 src/WINNT/afsd/afsd_eventlog.c       |    1 +
 src/WINNT/afsd/afsd_eventmessages.mc |    8 +
 src/WINNT/afsd/cm_callback.c         |    6 +-
 src/WINNT/afsd/cm_conn.c             |  270 +++++++++++++++++++++++++++-------
 src/WINNT/afsd/cm_conn.h             |   19 ++-
 src/WINNT/afsd/cm_dcache.c           |    8 +-
 src/WINNT/afsd/cm_ioctl.c            |    8 +-
 src/WINNT/afsd/cm_server.c           |    6 +-
 src/WINNT/afsd/cm_utils.c            |    3 +-
 src/WINNT/afsd/cm_vnodeops.c         |   27 ++--
 src/WINNT/afsd/cm_volume.c           |   18 ++-
 src/WINNT/afsd/cm_volume.h           |    2 +
 12 files changed, 284 insertions(+), 92 deletions(-)

-- 
OpenAFS Master Repository