[OpenAFS-devel] RX_MAX_FRAGS (yet again)

chas williams chas@cmf.nrl.navy.mil
Mon, 01 Oct 2001 19:41:25 -0400


In message <200110012101.RAA27034@pepsi-one.mit.edu>,Nickolai Zeldovich writes:
>Ah..  The ifMTU you're seeing here is what the fileserver thinks its
>local MTU is (and what it advertises to 134.207.10.66).  On the other
>hand, 134.207.10.66 believes its local ifMTU to be 1444, as seen from
>`rxdebug 134.207.10.66 -port 7001 -peers`.

i guess i have to agree on that one.  my host does seem to advertise a
rather small mtu.  i suppose i will add the extra params to syscall
(which means rewriting the ia64 syscall stub as well *sigh*)

>Agreed, except for the value of natMTU.  Looking just above that code
>in rxi_ReceiveAckPacket (around rx.c:3470), natMTU gets set to the
>smaller of the local ifMTU and the remote peer's idea of ifMTU, which
>they just sent us in the ack packet.

right.  see that now.

>I think Solaris is broken in some ways (in particular, ADAPT_MTU isn't
>enabled in kernel code), so the cache manager doesn't look at the kernel
>MTUs and always advertises an ifMTU of 1444.

well that does seem a bit broken.  ever tried with ADAPT_MTU set?

>I think it actually never sends a packet bigger than maxMTU, where
>maxMTU is approximately RX_MAX_FRAGS*natMTU, and natMTU is
>MIN(our ifMTU, peer ifMTU).

this would seem to be a bad idea though.

>Agreed, at least for 3.5 jumbograms.  I'm still a bit unclear on what
>3.4a jumbograms are.

i believe 3.4a jumbograms had an rx_header at the beginning of each
datagram.

>In this code, I believe we're trying to compute the largest UDP
>packet we could ever receive (rx_maxReceiveSize).  The constraints
>are the local interface MTU's, and the maximum number of fragments
>we'd want our UDP packets to be fragmented into.  So it's basically
>MAX(interfaceMTU * RX_MAX_FRAGS) over all interfaces, capped by
>RX_MAX_PACKET_SIZE and rx_maxRecieveSizeUser.

i was thinking rx was trying to determine the biggest packet it
could send without relying on the uderlying ip stack to fragment
the (ip) datagrams.   this has been the source my confusion for
the most part.