[OpenAFS-devel] .35 sec rx delay bug?
chas williams - CONTRACTOR
chas@cmf.nrl.navy.mil
Thu, 08 Nov 2007 11:13:23 -0500
In message <473281D5.1080100@secure-endpoints.com>,Jeffrey Altman writes:
>Was this code ever tested?
no idea. jim had the problem not me. i run this code locally and i
havent had any complaints. i was able to duplicate the issue since
i have multi-homed servers with two different mtu sizes.
>Does it solve the problem it was intended to solve?
of course. the solution is not ideal. as i mentioned, you would need
path mtu discovery to implement a completely correct solution. i would
personally vote that you should not send datagrams > 1500 off your local
subnet. while jumbograms increase performance, its not clear to me
that the benefit outweighs the cost when going outside the local lan.
>Or was something else committed in its place?
i believe this was comitted as is as rx-mtu-fixes-20061218. however,
i dont closely track what the maintainers of the openafs do.
>Thanks.
>
>Jeffrey Altman
>
>
>chas williams - CONTRACTOR wrote:
>> In message <Pine.GSO.4.61-042.0611081225140.26418@johnstown.andrew.cmu.edu>,
>Derrick J Brashear writes:
>>> On Wed, 8 Nov 2006, Jim Rees wrote:
>>>> Yes, but that's what the code Chas quoted apparently tries to do. Why isn
>'t
>>>> it working?
>>> i have no idea.
>>
>> the fix for this will need at least two changes. in src/rx/rx.c, handling
>> of ack packets should check the peer->natMTU (what i would gather to
>> the interface mtu of the remote side) and peer->ifMTU (mtu of the
>> interface on local side). so if the remote mtu is less than the local
>> mtu, we should only send jumbograms to the limit of the remote mtu.
>> note that this patch removes a spurious MIN(,tSize) and probably abuses
>> the original intent of rxi_AdjustDgramPackets().
>>
>> Index: rx.c
>> ===================================================================
>> --- rx.c (revision 62)
>> +++ rx.c (working copy)
>> @@ -3770,9 +3770,9 @@
>> sizeof(afs_int32), &tSize);
>> maxDgramPackets = (afs_uint32) ntohl(tSize);
>> maxDgramPackets = MIN(maxDgramPackets, rxi_nDgramPackets);
>> - maxDgramPackets =
>> - MIN(maxDgramPackets, (int)(peer->ifDgramPackets));
>> - maxDgramPackets = MIN(maxDgramPackets, tSize);
>> + maxDgramPackets = MIN(maxDgramPackets, peer->ifDgramPackets);
>> + if (peer->natMTU < peer->ifMTU)
>> + maxDgramPackets = MIN(maxDgramPackets, rxi_AdjustDgramPackets(1
>, peer->natMTU));
>> if (maxDgramPackets > 1) {
>> peer->maxDgramPackets = maxDgramPackets;
>> call->MTU = RX_JUMBOBUFFERSIZE + RX_HEADER_SIZE;
>>
>> however, this doesnt completely fix the problem. peer->ifMTU is assumed
>> to be the mtu of the interface on the same subnet or RX_REMOTE_PACKET_SIZE
>> (packets sized for mtu = 1500). if your host has a single 9000 interface
>> and is sending to a non-local subnet, the guess for the ifMTU is wrong.
>> for a host with a single interface, this fix is simple. for hosts with
>> multiple interfaces with multiple mtus, i am not sure what the right
>> answer might be. i suppose if you cant guess the right interface you
>> should assume the worst case thatthe datagram will exit on the interface
>> with the largest mtu. but suppose you had two non-local multi-homed hosts
>> each with a 9000 and 1500 mtu interface. both would appear to have 9000
>> mtus but you could still get the original problem if your packets exited
>> the 9000 mtu interface but arrived on the 1500 mtu interface on the remote
>> side. perhaps one should never send jumbograms between non-local hosts?
>> anyway, here is a sample fix for the userspace side of rx.
>>
>> Index: rx_user.c
>> ===================================================================
>> --- rx_user.c (revision 62)
>> +++ rx_user.c (working copy)
>> @@ -613,11 +613,9 @@
>> rxi_InitPeerParams(struct rx_peer *pp)
>> {
>> afs_uint32 ppaddr;
>> - u_short rxmtu;
>> + u_short rxmtu = 0, maxmtu = 0;
>> int ix;
>>
>> -
>> -
>> LOCK_IF_INIT;
>> if (!Inited) {
>> UNLOCK_IF_INIT;
>> @@ -644,6 +642,8 @@
>>
>> LOCK_IF;
>> for (ix = 0; ix < rxi_numNetAddrs; ++ix) {
>> + if (maxmtu < myNetMTUs[ix])
>> + maxmtu = myNetMTUs[ix] - RX_IPUDP_SIZE;
>> if ((rxi_NetAddrs[ix] & myNetMasks[ix]) == (ppaddr & myNetMasks[ix])) {
>> #ifdef IFF_POINTOPOINT
>> if (myNetFlags[ix] & IFF_POINTOPOINT)
>> @@ -652,14 +652,14 @@
>> rxmtu = myNetMTUs[ix] - RX_IPUDP_SIZE;
>> if (rxmtu < RX_MIN_PACKET_SIZE)
>> rxmtu = RX_MIN_PACKET_SIZE;
>> - if (pp->ifMTU < rxmtu)
>> - pp->ifMTU = MIN(rx_MyMaxSendSize, rxmtu);
>> }
>> }
>> UNLOCK_IF;
>> + if (rxmtu)
>> + pp->ifMTU = MIN(rx_MyMaxSendSize, rxmtu);
>> if (!pp->ifMTU) { /* not local */
>> pp->timeout.sec = 3;
>> - pp->ifMTU = MIN(rx_MyMaxSendSize, RX_REMOTE_PACKET_SIZE);
>> + pp->ifMTU = MIN(rx_MyMaxSendSize, maxmtu ? maxmtu : RX_REMOTE_PACKET_SI
>ZE);
>> }
>> #else /* ADAPT_MTU */
>> pp->rateFlag = 2; /* start timing after two full packets
>*/
>> _______________________________________________
>> OpenAFS-devel mailing list
>> OpenAFS-devel@openafs.org
>> https://lists.openafs.org/mailman/listinfo/openafs-devel