[OpenAFS-devel] idle dead timeout processing in clients
Jeffrey Altman
jaltman@your-file-system.com
Wed, 30 Nov 2011 12:53:03 -0500
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig06F5DDF7675196ABEF0DA306
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Since before OpenAFS, Rx has included idle peer detection that was
activated only in the servers. This idle peer detection is referred to
in the code as the idle dead timeout. Idle peer detection is an
important mechanism of preventing denial of service attacks against the
AFS infrastructure.
AFS clients are just as susceptible to bad behavior as the file servers.
A file server can respond to the client with keep alive packets for an
extended period of time for a variety of reasons. The file server could:=
a. be severely overloaded and unable to process all
incoming requests in a timely fashion. (requests waiting
for threads.)
b. have a partition whose underlying disk (or iSCSI, etc) is
failing and all I/O requests on that device are blocking.
c. have a large number of threads blocking on a single vnode
and cannot process requests for other vnodes as a result.
d. be retrieving the requested data from a hierarchical storage
management system.
e. be malicious.
=46rom 2003 until the present there has been a gradual move towards
activating idle peer detection to clients with idle peer detection
arriving on the master branch (and in Windows 1.5.x clients) in the
Spring of 2008 and in the 1.6.0 release this past Summer.
The motivating factor for Unix clients was protection against case (b)
and improved fail over to a .readonly replica when cases (a) and (c)
occur. For Windows, the motivating factor was ensuring that all AFS
RPCs would terminate within the SMB timeout period (45 seconds) in order
to avoid the SMB client tearing down it SMB connection.
Unfortunately, client side idle peer detection is the root cause behind
Stephan Wiesand's bug report [RT#130327] in which a large number of
clients writing to a single volume begin to timeout, mark servers down,
and fail to complete store operations.
Derrick, Simon and I spent the last week analyzing the problem. Here is
our analysis:
1. The use of the RX_CALL_DEAD error to indicate an idle peer
does not provide enough information to the cache manager for
it to respond in a sensible manner. RX_CALL_DEAD is an
indication that the peer is not responding and should be marked
"down" until the next server probe.
2. Idle peer detection is only safe to use when the object that is
being accessed is known to be replicated and is not stored in a
HSM.
2a. The mere existence of idle peer detection breaks HSM deployments
because the file server can be expected to take an extremely long
time to retrieve some data. Perhaps hours in some edge cases.
2b. Data changing RPCs require that callbacks be broken. The timeout
for a callback break is the hard dead timeout which is 2 minutes.
The timeout for an idle peer is 1 minute. Any clients that are
waiting for an RPC to complete against a vnode on which a callback
is pending can end up marking the server down prior to the
completion of the RPC.
2c. When multiple clients issue data changing RPCs against a single
vnode there are increasingly longer completion times. When the
queue of pending requests is long enough idle peer detection trips
causing the server to be marked down and the RPC to fail due to
lack of a replica to failover to.
3. When a replica is available and it is known to not be backed
by an HSM, the use of idle peer detection is a win. Unfortunately,
the client has no knowledge of the backing store; nor should it.
And our conclusions:
1. Protecting against a failed disk, partition, etc. must be done
on the file server. Only the file server knows whether keep
alives are being sent while a pending I/O has failed to return.
2. Only the client knows whether there are replicas that can be
used to fail over to. A client can implement idle peer detection
but only for RPCs that are issued against replicated volumes
for which there is an available replica.
3. For all RPCs against volumes without replicas, idle peer detection
must be disabled.
4. The AFS protocol was not designed for use with HSMs. An RPC that
must be held in a keep alive state while the HSM retrieves the
necessary data blocks a limited file server resource (the worker
thread). Instead the file server should return a VBUSY indicating
that the object is being retrieved.
For new RPCs a better model can be implemented where the file
server issues a callback when the object is available. This
avoids having the clients poll the server. This is a problem
for the AFS3 stds group to address. In the meantime, it is
recommended that sites that deploy AFS backed by HSM not
stored replicated volumes in the HSM.
5. Protecting against a malicious server is hard. There is no idle
peer timeout value that can be set that won't cause some legitimate
workload to fail. As a result, at this time we cannot implement
such protection.
6. Idle peer detection in the client must never result in the file
server being marked "down". That is the impetus behind
http://gerrit.openafs.org/#change,6128
which permits a new locally generated error RX_CALL_IDLE to be
reported to the cache manager when the idle peer detection has
triggered.
7. Idle peer timeouts increase the load on the file servers due to
an increased likelihood that the client will re-use a call channel
that the file server considers in use.
For 1.6.1 we propose that:
1. Since client side idle peer detection is inherently broken that
it be disabled entirely on Unix clients.
2. Since Windows clients must support idle peer detection to address
the SMB timeout issue, idle peer detection will be activated only
for SMB initiated requests. A registry option will be provided
to permit a cell to be configured in no-HSM mode. For such a
cell, idle peer timeouts will be active only when an available
replica is known to be available. This is possible for Windows
because of the existence of the registry based CellServDB.
Jeffrey Altman
--------------enig06F5DDF7675196ABEF0DA306
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
iQEcBAEBAgAGBQJO1m2BAAoJENxm1CNJffh43ksIAMWgmEWYiqMSbK5bGpaMGf9i
iYrMYk78dF3PK90nR68U0qUdBY65ilbTipDqF782mMA3gvxhZUqi1UT+TTOVp7aF
DxndMCrLmbr4d7O7u9S3apqCe3BNJ8+du6r+UYOZeannMIg/3wWdZMaL5ZrBqBEd
pYNsNmEzzsv7JTfrXt0GtihaR0NMnhoTI37QPEUOV3Kg/SPXCdUpsSSXx5qMvTO4
M8GybFs2p+JObZSBz6EW8mka9vYqh0DgZshvlbbYzOBMHWQsamHALqWERzmPHPCp
EZ66WzsgA91kqObaPkon3ejxFgTpvRs4bZy+7/YG1ko1VfoK1jzOnskz9gUl9Is=
=gdRc
-----END PGP SIGNATURE-----
--------------enig06F5DDF7675196ABEF0DA306--