[OpenAFS-devel] Re: idle dead timeout processing in clients

Jeffrey Altman jaltman@your-file-system.com
Wed, 30 Nov 2011 16:06:12 -0500


This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigAAA157513E5A0BD513F91647
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 11/30/2011 2:58 PM, Jeffrey Altman wrote:
> Andrew:
>=20
> The block of code in rx.c that is at question is:
>=20
>     if (call->lastSendData && idleDeadTime && (conn->idleDeadErr !=3D 0=
)
>         && ((call->lastSendData + idleDeadTime) < now)) {
>         if (call->state =3D=3D RX_STATE_ACTIVE) {
>             cerror =3D conn->idleDeadErr;
>             goto mtuout;
>         }
>     }
>=20
> when conn->idleDeadErr is set to RX_CALL_DEAD in afs/afs_conn.c:
>=20
>   rx_SetServerConnIdleDeadErr(tc->id, RX_CALL_DEAD);
>=20
> Please take a look at 6128.
>=20
> Jeffrey Altman

This was added by d26f5e158cffa313d0f504e7ba3afc1743b5d1ef as part of
the MTU size probes that were developed at UIUC and that made things
much worse because no handling of RX_CALL_DEAD was added to the Unix CM
in afs_Analyze().  However, that is not the core problem with idle dead
time processing.

The concept is flawed.  In the case where a file server is unable to
process a call because either all of its threads are in use OR because
the RPC in question is blocked from processing while another RPC
completes keep alives will be sent to the client.  If the client times
out the call and retries due to RX_CALL_TIMEOUT all it is doing is
placing itself at the back of the queue on the file server and
potentially taking up another file server thread.  In other words, it
makes a sad file server even more unhappy.

The implementation is broken in a number of ways:

1. it applies equally to operations against both replicated and
non-replicated objects.   Only replicated objects should have
RX_CALL_TIMEOUT trigger at all.

2. the idle timeout is shorter than the hard dead timeout which is the
timeout the file server uses when breaking callbacks.  Therefore idle
timeout processing triggers when the file server is enforcing cache
coherency.  This will occur on a r/w volume is which case the retry is
going to hit the same file server that we just timed out.  There is no
gain here just additional overhead for the file server and delays
imposed on the client.

3. idle dead time violates cache coherency.  I didn't include this in my
original post but it does.

Three clients A, B and C have callback registrations for vnode
9999.24232.78382.

Client A issues an RPC "CreateFile Foo" on Vnode 9999.24232.78382 to the
file server.  The file server is processing the request breaks callbacks
to Client B which respond immediately and Client C which does not respond=
=2E

Client B issues an RPC "CreateFile Bar" on vnode 9999.24232.78382 to the
file server.  This RPC blocks while waiting for the vnode lock.

The file server waits the hard dead timeout (2 minutes) for Client C to
respond.  In the meantime, Client B waits the idle dead timeout (1
minute), determines there is no alternative site, and retries the RPC
which then blocks on the vnode lock at the file server.

The file server times out the callback break and begins to process the
first "CreateFile Bar" request.  It completes successfully but the
response with the current status info cannot be returned to the client
because the call is dead.

The file server then processes the second "CreateFile Bar" request which
fails with EEXIST.

Client B is now left with an error it should not have gotten and no
callback.  The error is returned to the application which may or may not
be well behaved when presented with an unexpected error.

It is true that the cache manager can work around this by performing an
additional FetchStatus RPC, notice the data version has changed out from
underneath it, and refetch the directory contents, notice the file is in
fact now there and return success to the application.  However, that is
a lot of work to do in the name of noticing when a file server has
gotten stuck.

As you indicated, this block:

    /* see if we have a non-activity timeout */
    if (call->startWait && idleDeadTime
        && ((call->startWait + idleDeadTime) < now) &&
        (call->flags & RX_CALL_READER_WAIT)) {
        if (call->state =3D=3D RX_STATE_ACTIVE) {
            cerror =3D RX_CALL_TIMEOUT;
            goto mtuout;
        }
    }

triggers for clients when they are waiting to read the result of an RPC.
 As Simon will explain in more detail in a separate e-mail, this block
will fire if all data associated with the RPC has been transmitted and
the call has been turned around.  The RPC could be blocked on a vnode
lock or simply waiting for a thread to be scheduled at the file server.
 It could be a FetchStatus or a small StoreData; perhaps a StoreData
that represents a truncation.

Client A issues a StoreData with a file size in it.  The call blocks
waiting for a thread and the client timesout the call.

Client B issues a StoreData with data.

Client A retries the StoreData with the same old file size.

The file server truncates the file, stores B's data and then erases B's
data with the repeated truncation.

I can come up with additional scenarios.

The idle data processing is fundamentally flawed and needs to be backed o=
ut.

Jeffrey Altman


--------------enigAAA157513E5A0BD513F91647
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)

iQEcBAEBAgAGBQJO1prGAAoJENxm1CNJffh40DAH/jn6xBS5zi7dlu7H8gJ2NwMU
UmtDUhZT2lWpd7/IjBe1kqPybsXABg8AhbINNECZk1YK+lPTlDPxz25GRqrD/L5q
yCd/Yu7rElt0LH5zTIpFq80q6RObzcaOzp1GXj7UdPYQN+9cACTwaelJyR/NKZ7N
4wPRbsdSnuykJdxYGiiGymG94SJ1J3KFgOieYASc1YzndRCyGgo1LqqqDZRk16fv
eQRyTHQN8y+gCBWTQ8vjA8WKxafFgMkLbOy3IE0x7ELXntnoVPChO3ZpYMl8JoTr
IwW4rIzwbJyUpEhEhtof8YSotCbWC1jjp0YqYBPyA2yCyXYSK16e72iz+nVXfcc=
=wm73
-----END PGP SIGNATURE-----

--------------enigAAA157513E5A0BD513F91647--