[OpenAFS-devel] .35 sec rx delay bug?

chas williams - CONTRACTOR chas@cmf.nrl.navy.mil
Thu, 08 Nov 2007 11:13:23 -0500


In message <473281D5.1080100@secure-endpoints.com>,Jeffrey Altman writes:
>Was this code ever tested?

no idea.  jim had the problem not me.  i run this code locally and i
havent had any complaints.  i was able to duplicate the issue since
i have multi-homed servers with two different mtu sizes.

>Does it solve the problem it was intended to solve?

of course.  the solution is not ideal.  as i mentioned, you would need
path mtu discovery to implement a completely correct solution.  i would
personally vote that you should not send datagrams > 1500 off your local
subnet.  while jumbograms increase performance, its not clear to me 
that the benefit outweighs the cost when going outside the local lan.

>Or was something else committed in its place?

i believe this was comitted as is as rx-mtu-fixes-20061218.  however,
i dont closely track what the maintainers of the openafs do.

>Thanks.
>
>Jeffrey Altman
>
>
>chas williams - CONTRACTOR wrote:
>> In message <Pine.GSO.4.61-042.0611081225140.26418@johnstown.andrew.cmu.edu>,
>Derrick J Brashear writes:
>>> On Wed, 8 Nov 2006, Jim Rees wrote:
>>>> Yes, but that's what the code Chas quoted apparently tries to do.  Why isn
>'t
>>>> it working?
>>> i have no idea.
>> 
>> the fix for this will need at least two changes.  in src/rx/rx.c, handling
>> of ack packets should check the peer->natMTU (what i would gather to
>> the interface mtu of the remote side) and peer->ifMTU (mtu of the
>> interface on local side).  so if the remote mtu is less than the local
>> mtu, we should only send jumbograms to the limit of the remote mtu.
>> note that this patch removes a spurious MIN(,tSize) and probably abuses
>> the original intent of rxi_AdjustDgramPackets().
>> 
>> Index: rx.c
>> ===================================================================
>> --- rx.c	(revision 62)
>> +++ rx.c	(working copy)
>> @@ -3770,9 +3770,9 @@
>>  			  sizeof(afs_int32), &tSize);
>>  	    maxDgramPackets = (afs_uint32) ntohl(tSize);
>>  	    maxDgramPackets = MIN(maxDgramPackets, rxi_nDgramPackets);
>> -	    maxDgramPackets =
>> -		MIN(maxDgramPackets, (int)(peer->ifDgramPackets));
>> -	    maxDgramPackets = MIN(maxDgramPackets, tSize);
>> +	    maxDgramPackets = MIN(maxDgramPackets, peer->ifDgramPackets);
>> +	    if (peer->natMTU < peer->ifMTU)
>> +	        maxDgramPackets = MIN(maxDgramPackets, rxi_AdjustDgramPackets(1
>, peer->natMTU));
>>  	    if (maxDgramPackets > 1) {
>>  		peer->maxDgramPackets = maxDgramPackets;
>>  		call->MTU = RX_JUMBOBUFFERSIZE + RX_HEADER_SIZE;
>> 
>> however, this doesnt completely fix the problem.  peer->ifMTU is assumed
>> to be the mtu of the interface on the same subnet or RX_REMOTE_PACKET_SIZE
>> (packets sized for mtu = 1500).  if your host has a single 9000 interface
>> and is sending to a non-local subnet, the guess for the ifMTU is wrong.
>> for a host with a single interface, this fix is simple.  for hosts with
>> multiple interfaces with multiple mtus, i am not sure what the right
>> answer might be.  i suppose if you cant guess the right interface you
>> should assume the worst case thatthe datagram will exit on the interface
>> with the largest mtu.  but suppose you had two non-local multi-homed hosts
>> each with a 9000 and 1500 mtu interface.  both would appear to have 9000
>> mtus but you could still get the original problem if your packets exited
>> the 9000 mtu interface but arrived on the 1500 mtu interface on the remote
>> side.  perhaps one should never send jumbograms between non-local hosts?
>> anyway, here is a sample fix for the userspace side of rx.
>> 
>> Index: rx_user.c
>> ===================================================================
>> --- rx_user.c	(revision 62)
>> +++ rx_user.c	(working copy)
>> @@ -613,11 +613,9 @@
>>  rxi_InitPeerParams(struct rx_peer *pp)
>>  {
>>      afs_uint32 ppaddr;
>> -    u_short rxmtu;
>> +    u_short rxmtu = 0, maxmtu = 0;
>>      int ix;
>>  
>> -
>> -
>>      LOCK_IF_INIT;
>>      if (!Inited) {
>>  	UNLOCK_IF_INIT;
>> @@ -644,6 +642,8 @@
>>  
>>      LOCK_IF;
>>      for (ix = 0; ix < rxi_numNetAddrs; ++ix) {
>> +        if (maxmtu < myNetMTUs[ix])
>> +		maxmtu = myNetMTUs[ix] - RX_IPUDP_SIZE;
>>  	if ((rxi_NetAddrs[ix] & myNetMasks[ix]) == (ppaddr & myNetMasks[ix])) {
>>  #ifdef IFF_POINTOPOINT
>>  	    if (myNetFlags[ix] & IFF_POINTOPOINT)
>> @@ -652,14 +652,14 @@
>>  	    rxmtu = myNetMTUs[ix] - RX_IPUDP_SIZE;
>>  	    if (rxmtu < RX_MIN_PACKET_SIZE)
>>  		rxmtu = RX_MIN_PACKET_SIZE;
>> -	    if (pp->ifMTU < rxmtu)
>> -		pp->ifMTU = MIN(rx_MyMaxSendSize, rxmtu);
>>  	}
>>      }
>>      UNLOCK_IF;
>> +    if (rxmtu)
>> +	pp->ifMTU = MIN(rx_MyMaxSendSize, rxmtu);
>>      if (!pp->ifMTU) {		/* not local */
>>  	pp->timeout.sec = 3;
>> -	pp->ifMTU = MIN(rx_MyMaxSendSize, RX_REMOTE_PACKET_SIZE);
>> +	pp->ifMTU = MIN(rx_MyMaxSendSize, maxmtu ? maxmtu : RX_REMOTE_PACKET_SI
>ZE);
>>      }
>>  #else /* ADAPT_MTU */
>>      pp->rateFlag = 2;		/* start timing after two full packets 
>*/
>> _______________________________________________
>> OpenAFS-devel mailing list
>> OpenAFS-devel@openafs.org
>> https://lists.openafs.org/mailman/listinfo/openafs-devel