[OpenAFS-devel] Re: idle dead timeout processing in clients

Simon Wilkinson sxw@inf.ed.ac.uk
Wed, 30 Nov 2011 21:13:31 +0000


On 30 Nov 2011, at 18:58, Andrew Deason wrote:

> On Wed, 30 Nov 2011 18:48:47 +0000
> Simon Wilkinson <sxw@inf.ed.ac.uk> wrote:
>=20
>> The idle dead code isn't in any shipping versions of 1.4. Current 1.4
>> clients won't get RX_CALL_TIMEOUT, or RX_CALL_DEAD.
>=20
> I'm not sure if we're talking about completely different things or =
what.
> The afs_BlackListOnce code exists in (shipping) 1.4 and, I mean, it
> certainly gets _called_. If I insert a sleep(10000) into the =
FetchStatus
> handler, the client will give an error (or failover to another site,
> etc); it won't just hang forever on the request.

Okay, so this is all a bit convoluted (isn't everything with RX!). There =
are two ways in which an idle dead timeout can be caused...

The relevant code from rxi_CheckCall() is:

    /* see if we have a non-activity timeout */
    if (call->startWait && idleDeadTime
        && ((call->startWait + idleDeadTime) < now) &&
        (call->flags & RX_CALL_READER_WAIT)) {
        if (call->state =3D=3D RX_STATE_ACTIVE) {
            cerror =3D RX_CALL_TIMEOUT;
            goto mtuout;
        }
    }
    if (call->lastSendData && idleDeadTime && (conn->idleDeadErr !=3D 0)
        && ((call->lastSendData + idleDeadTime) < now)) {
        if (call->state =3D=3D RX_STATE_ACTIVE) {
            cerror =3D conn->idleDeadErr;
            goto mtuout;
        }
    }

The first code is in 1.4.x, and is enabled there - it returns =
CALL_TIMEOUT, which is handled by BlackListOnce. The second block is =
only enabled on 1.6 and master and is configured to return CALL_DEAD.=20

The first block only fires on clients which have turned the call around, =
and are now attempting to read from the fileserver. This is actually =
really fragile - what CALL_RECEIVE_WAIT actually means is that the =
application thread has managed to push all of its packets into the RX =
layer, and is now blocked on rx_Read(). In the current implementation =
this just means that the number of transmitted packets left =
unacknowledged is less than 2x the current window size. For pretty much =
every AFS-3 RPC other than StoreData, it's meaningless - we'll enter =
RECEIVE_WAIT immediately. What it does mean is that this block is very =
unlikely to fire for StoreData, as for most chunk sizes we'll be writing =
more packets than can be held in the buffer.

So, with StoreData we'll hit the second block. If the other end isn't =
reading packets out of RX (because it's blocked on I/O, for example), we =
won't be able to send any packets, and we'll trigger the timeout.

It's this behaviour, coupled with the lack of error handling for =
CALL_DEAD, and the fact that we don't try and flush a full cache apart =
from when we're writing to it, that was the root cause of the original =
bug report.

However, all of this has exposed some real problems with the idle dead =
code, as it currently stands. I believe that some of them are the root =
cause of some long standing bug reports.

1) If you have an RPC with a small number of arguments (say CreateFile) =
the client will end up in READER_WAIT as soon as it has transmitted the =
first packet. If that CreateFile requires a callback break which takes =
longer than
the idle dead timeout, then the client will timeout the call with =
CALL_TIMEOUT. In the meantime, the server will complete the callback =
break, and create the file. afs_Analyze will receive CALL_TIMEOUT and =
retry the operation, the server will see that the file already exists, =
and return EEXIST. So, we have an operation that has actually succeeded =
returning an error.

2) In cases where a fileserver is taking a long time to break callbacks, =
the client can end up giving up due to idle dead timeouts, even if the =
server would later be able to handle its request. In 1.4, we'll retry, =
and (possibly) succeed, in 1.6 we'll tend to hit the second case first =
and so fail. However, just retrying has penalties ..

3) Idle dead is a big cause of call busy problems. It breaks the client =
and the server's view of which call slots are empty. Take the example of =
a client that has slots 2,3,4 busy with long-running store operations. =
Slot 1 hits an idle dead timeout, and the client must retry. So, it =
starts a new call to the server, but the only slot that's available is =
slot 1. It starts a call on that slot, but that's then bounced back with =
CALL_BUSY by the server.

Cheers,

Simon.