[OpenAFS-devel] RX_MAX_FRAGS (yet again)

Nickolai Zeldovich kolya@MIT.EDU
Mon, 01 Oct 2001 17:01:21 -0400


> ok, while your patches might fix the problem i am not sure that they are
> entirely correct.  without trying the patch, rxdebug shows the right 
> ifMTU for my linux host already:
> 
> Peer at host 134.207.10.66, port 7001
>         ifMTU 8524      natMTU 1444     maxMTU 5692

Ah..  The ifMTU you're seeing here is what the fileserver thinks its
local MTU is (and what it advertises to 134.207.10.66).  On the other
hand, 134.207.10.66 believes its local ifMTU to be 1444, as seen from
`rxdebug 134.207.10.66 -port 7001 -peers`.

> look at rxi_AdjustMaxMTU(), and how its used in rxi_ReceiveAckPacket():
> 
> 	tSize = (afs_uint32)ntohl(tSize);
> 	tSize = (afs_uint32)MIN(tSize, rx_MyMaxSendSize);
> 	tSize = rxi_AdjustMaxMTU(peer->natMTU, tSize);
> 
> natMTU is going to be MIN((int)pp->ifMTU, OLD_MAX_PACKET_SIZE) [..]
> so the largest xmit value ever used will be peer->natMTU * rxi_nSendFrags.

Agreed, except for the value of natMTU.  Looking just above that code
in rxi_ReceiveAckPacket (around rx.c:3470), natMTU gets set to the
smaller of the local ifMTU and the remote peer's idea of ifMTU, which
they just sent us in the ack packet.

Your fileserver already has a 8524 local ifMTU, so if the peer were
to advertise a 8524 ifMTU as well, natMTU should get set to 8524.

> as proof, here is a solaris8 host (no ethernet interfaces plumbed)

I think Solaris is broken in some ways (in particular, ADAPT_MTU isn't
enabled in kernel code), so the cache manager doesn't look at the kernel
MTUs and always advertises an ifMTU of 1444.

> however, i would like to make sure i am not completely and utterly nuts
> (i was painting this weekend it could still be the fumes)  here are
> my assumptions:
> 
> . rx never sends a packet bigger than the MIN(our ifmtu, peer ifmtu)

I think it actually never sends a packet bigger than maxMTU, where
maxMTU is approximately RX_MAX_FRAGS*natMTU, and natMTU is
MIN(our ifMTU, peer ifMTU).

> . a jumbogram is nothing but a series of rx datagrams in a single packet
>   (whether its a 3.4a or 3.5 jumbogram)

Agreed, at least for 3.5 jumbograms.  I'm still a bit unclear on what
3.4a jumbograms are.

[ Also, I think the terminology that the AFS code uses is slightly
  different as well: a datagram is a UDP packet sent over the wire,
  potentially fragmented by the IP stack.  A jumbogram is a single
  datagram, containing multiple Rx packets. ]

> . RX_MAX_FRAGS controls the maximum of rx datagrams in a jumbogram (4)

I think this isn't the case.  RX_MAX_DGRAM_PACKETS caps the maximum
number of Rx packets that get put into a jumbogram.  For example, if
you have client A and fileserver B on the same ATM network, you should
theoretically end up with these values:

		A	B
    ifMTU	8524	8524
    natMTU	8524	8524
    maxMTU	16384	16384	(4*8524 but capped by RX_MAX_PACKET_SIZE)

Then, even though RX_MAX_FRAGS=4, you get 6 1414-byte Rx packets per
jumbogram because they fit into maxMTU.  (Infact they even fit into
natMTU, so you'd get 6 packets/jumbogram with RX_MAX_FRAGS=1 too.)

> . the biggest rx datagram is ~1412 bytes

This is effectively the case with 3.5 jumbograms enabled, yeah.
(s/datagram/packet/ to be internally consistent on terminology
on my part.)

> . rx_MyMaxSendSize is the max send size afs will ever use (8588)
> . rx_maxReceiveSize is the max recv size 
> . RX_MAX_PACKET_SIZE is the maximum packet size (16384)

I agree with all of these.

> btw, given the above, the following computation makes no sense to me unless
> the mtu is the natMTU, and not the interface mtu:
> 
> 	rxmtu = rxi_AdjustIfMTU(rxmtu);
> 	maxmtu = rxmtu * rxi_nRecvFrags + ((rxi_nRecvFrags-1) * UDP_HDR_SIZE);
> 	maxmtu = rxi_AdjustMaxMTU(rxmtu, maxmtu);
> 
> rxmtu is first normalized to a multiple of the rxdatagram size.  then,
> the maxmtu is computed using that mtu size.  this has to be wrong.  this
> would make the maxmtu much larger than the interface size which isnt desirable.

In this code, I believe we're trying to compute the largest UDP
packet we could ever receive (rx_maxReceiveSize).  The constraints
are the local interface MTU's, and the maximum number of fragments
we'd want our UDP packets to be fragmented into.  So it's basically
MAX(interfaceMTU * RX_MAX_FRAGS) over all interfaces, capped by
RX_MAX_PACKET_SIZE and rx_maxRecieveSizeUser.

-- kolya