[OpenAFS] Re: 1.6 clients: rx version pings

Stephan Wiesand stephan.wiesand@desy.de
Sat, 3 Dec 2011 21:40:29 +0100


On Dec 2, 2011, at 23:23 , Derrick Brashear wrote:

> It's going to be  in afs_conn.c, probably in afs_Conn, the rx NatPing =
enabling.

Thanks. I ended up applying this:

--- openafs-1.6.0/src/afs/afs_conn.c.orig       2011-08-16 =
04:32:24.000000000 +0200
+++ openafs-1.6.0/src/afs/afs_conn.c    2011-12-03 13:40:49.960300876 =
+0100
@@ -306,10 +306,11 @@
         * Only do this for the base connection, not per-user.
         * Will need to be revisited if/when CB gets security.
         */
+       /* sw
        if ((isec =3D=3D 0) && (service !=3D 52) && !(tu->states & =
UTokensBad) &&
            (tu->vid =3D=3D UNDEFVID))
            rx_SetConnSecondsUntilNatPing(tc->id, 20);
-
+       */
        tc->forceConnectFS =3D 0; /* apparently we're appropriately =
connected now */
        if (csec)
            rxs_Release(csec);

It seems to do the the job and no harm (we don't do NAT). After =
deploying the patched client on most 1.6 systems, things are a lot more =
quiet now. Before, the 10% 1.6 clients we're now running kept a typical =
fileserver ~3% busy.


- Stephan

>=20
> Derrick
>=20
> On Dec 2, 2011, at 4:53 PM, Stephan Wiesand <stephan.wiesand@desy.de> =
wrote:
>=20
>>=20
>> On Dec 2, 2011, at 18:23 , Andrew Deason wrote:
>>=20
>>> On Fri, 2 Dec 2011 17:52:14 +0100
>>> Stephan Wiesand <stephan.wiesand@desy.de> wrote:
>>>=20
>>>> we had seen this during EAKC already: 1.6 clients are supposed to =
ping
>>>> file servers once a second, yet they do so at much higher rates. As
>>>> the number of 1.6 clients is increasing here, this has become a =
real
>>>> problem.
>>>=20
>>> If you're talking about the rx nat keepalive ping (they appear as =
"rx
>>> version reply" packets on the wire), it's only supposed to be once =
every
>>> 20 seconds. I believe there were issues before where that would be =
done
>>> for _every_ connection to the fileserver, but I thought it was =
fixed...
>>> somewhere (possibly post-1.6.0?). I assume Derrick can answer that
>>> faster than I can find it.
>>=20
>> Yes, I believe that's what I'm talking about, and I recall it's even =
supposed to be 1/20 Hz, and sorry for not being precise. This is what I =
see on a former fileserver:
>>=20
>> 18:37:32.916181 IP client.afs3-callback > server.afs3-fileserver:  rx =
version (29)
>> 18:37:32.916214 IP client.afs3-callback > server.afs3-fileserver:  rx =
version (29)
>> 18:37:32.916242 IP client.afs3-callback > server.afs3-fileserver:  rx =
version (29)
>> 18:37:32.916289 IP client.afs3-callback > server.afs3-fileserver:  rx =
version (29)
>> 18:37:32.916325 IP client.afs3-callback > server.afs3-fileserver:  rx =
version (29)
>>=20
>> It rather seems like "every connection the client ever had"...
>>=20
>> rxdebug on the client lists no connection to the server.
>>=20
>> The total rate of those incoming packets is several kHz - from 21 =
(former) clients.
>>=20
>>>> Is there any way to prevent the client from doing this? Any way to =
at
>>>> least make it forget an old fileserver? Or at least reset the rate =
to
>>>> the 1 Hz it should be? Can this be disabled altogether? Supposed I
>>>> find the place in the code where these pings happen and just remove
>>>> them, what would be the consequences?
>>>=20
>>> If they are the nat keepalive pings, they're just for keeping port
>>> mappings open for nats and stateful firewalls and such. There should =
be
>>> a way to turn them off, but I don't believe there is right now.
>>=20
>> Thanks a lot. I'll try to find them in the source and get rid of =
them. Still hoping for a hint making this search more efficient, though.
>>=20
>> Thanks again,
>>   Stephan

--=20
Stephan Wiesand
DESY -DV-
Platanenenallee 6
15738 Zeuthen, Germany