[OpenAFS-devel] .35 sec rx delay bug?

chas williams - CONTRACTOR chas@cmf.nrl.navy.mil
Tue, 14 Nov 2006 22:57:01 -0500


In message <Pine.GSO.4.61-042.0611081225140.26418@johnstown.andrew.cmu.edu>,Derrick J Brashear writes:
>On Wed, 8 Nov 2006, Jim Rees wrote:
>> Yes, but that's what the code Chas quoted apparently tries to do.  Why isn't
>> it working?
>
>i have no idea.

the fix for this will need atleast two changes.  in src/rx/rx.c, handling
of ack packets should check the peer->natMTU (what i would gather to
the the interface mtu of the remote side) and peer->ifMTU (mtu of the
interface on local side).  so if the remote mtu is less than the local
mtu, we should only send jumbograms to the limit of the remote mtu.
note that this patch removes a spurious MIN(,tSize) and probably abuses
the original intent of rxi_AdjustDgramPackets().

Index: rx.c
===================================================================
--- rx.c	(revision 62)
+++ rx.c	(working copy)
@@ -3770,9 +3770,9 @@
 			  sizeof(afs_int32), &tSize);
 	    maxDgramPackets = (afs_uint32) ntohl(tSize);
 	    maxDgramPackets = MIN(maxDgramPackets, rxi_nDgramPackets);
-	    maxDgramPackets =
-		MIN(maxDgramPackets, (int)(peer->ifDgramPackets));
-	    maxDgramPackets = MIN(maxDgramPackets, tSize);
+	    maxDgramPackets = MIN(maxDgramPackets, peer->ifDgramPackets);
+	    if (peer->natMTU < peer->ifMTU)
+	        maxDgramPackets = MIN(maxDgramPackets, rxi_AdjustDgramPackets(1, peer->natMTU));
 	    if (maxDgramPackets > 1) {
 		peer->maxDgramPackets = maxDgramPackets;
 		call->MTU = RX_JUMBOBUFFERSIZE + RX_HEADER_SIZE;

however, this doesnt completely fix the problem.  peer->ifMTU is assumed
to be the mtu of the interface on the same subnet or RX_REMOTE_PACKET_SIZE
(packets sized for mtu = 1500).  if your host has a single 9000 interface
and is sending to a non-local subnet, the guess for the ifMTU is wrong.
for a host with a single interface, this fix is simple.  for hosts with
multiple interfaces with multiple mtus, i am not sure what the right
answer might be.  i suppose if you cant guess the right interface you
should assume the worst case thatthe datagram will exit on the interface
with the largest mtu.  but suppose you had two non-local multi-homed hosts
each with a 9000 and 1500 mtu interface.  both would appear to have 9000
mtus but you could still get the original problem if your packets exited
the 9000 mtu interface but arrived on the 1500 mtu interface on the remote
side.  perhaps one should never send jumbograms between non-local hosts?
anyway, here is a sample fix for the userspace side of rx.

Index: rx_user.c
===================================================================
--- rx_user.c	(revision 62)
+++ rx_user.c	(working copy)
@@ -613,11 +613,9 @@
 rxi_InitPeerParams(struct rx_peer *pp)
 {
     afs_uint32 ppaddr;
-    u_short rxmtu;
+    u_short rxmtu = 0, maxmtu = 0;
     int ix;
 
-
-
     LOCK_IF_INIT;
     if (!Inited) {
 	UNLOCK_IF_INIT;
@@ -644,6 +642,8 @@
 
     LOCK_IF;
     for (ix = 0; ix < rxi_numNetAddrs; ++ix) {
+        if (maxmtu < myNetMTUs[ix])
+		maxmtu = myNetMTUs[ix] - RX_IPUDP_SIZE;
 	if ((rxi_NetAddrs[ix] & myNetMasks[ix]) == (ppaddr & myNetMasks[ix])) {
 #ifdef IFF_POINTOPOINT
 	    if (myNetFlags[ix] & IFF_POINTOPOINT)
@@ -652,14 +652,14 @@
 	    rxmtu = myNetMTUs[ix] - RX_IPUDP_SIZE;
 	    if (rxmtu < RX_MIN_PACKET_SIZE)
 		rxmtu = RX_MIN_PACKET_SIZE;
-	    if (pp->ifMTU < rxmtu)
-		pp->ifMTU = MIN(rx_MyMaxSendSize, rxmtu);
 	}
     }
     UNLOCK_IF;
+    if (rxmtu)
+	pp->ifMTU = MIN(rx_MyMaxSendSize, rxmtu);
     if (!pp->ifMTU) {		/* not local */
 	pp->timeout.sec = 3;
-	pp->ifMTU = MIN(rx_MyMaxSendSize, RX_REMOTE_PACKET_SIZE);
+	pp->ifMTU = MIN(rx_MyMaxSendSize, maxmtu ? maxmtu : RX_REMOTE_PACKET_SIZE);
     }
 #else /* ADAPT_MTU */
     pp->rateFlag = 2;		/* start timing after two full packets */