[OpenAFS-devel] Path MTU discovery

Troy Benjegerdes hozer@hozed.org
Sat, 15 Sep 2012 17:09:03 -0500


> Low MTU is where the MTU of the link is smaller than the RX packet size. This is the case that Derrick discovered at the conference at UIUC and wrote code to work around. Low MTU detection doesn't use the traditional path MTU discovery code, but instead uses padded RX ping packets. If we don't get a response to a ping packet of a certain size, then we resend the ping with a lower size. When we eventually get a response, that's the MTU of the link. This is the code that uses rx_SetMsgsizeRetryErr - if that's registered, and we aren't making progress because of MTU, then the call will be failed with that error, and the application can retry, and thus get a smaller packet size.
> 
> To my mind, keeping the two of these separate makes sense at present. There are a lot of questions around support for setting the DF flag, and getting the ICMP errors delivered to the RX stack, especially when that stack is in userspace. The low MTU detection should work everywhere. Last time I looked, low MTU had some issues - in particular, it was using hard ACKs to determine with a call was making progress, when actually the presence of soft ACKs is sufficient (you don't care that the packet has reached the application, just that it has been successfully received by the network stack)
> 
> It would be good to keep discussing this. Like most of RX, this code is all a bit tangled, and I think discussing overall design intent is a great way to make sure that the patches do what we all expect them to!
> 


I like us to also keep in mind how to make this code less tangled in the
future, as IPv6 has mandatory path MTU discovery, and *should* be more 
reliable than with IPv4.

Along slightly similiar lines, whatever happened to RxTCP, and how would
we deal with high-performance applications with RDMA (tcp offload, or 
Infiniband). Given how long code in the OpenAFS tree lives, let's at 
think about how to support these in a less-tangled manner, even if nobody
has funding right now to do it.