[OpenAFS-devel] Re: Path MTU discovery

Andrew Deason adeason@sinenomine.net
Tue, 25 Sep 2012 16:17:14 -0500


On Tue, 25 Sep 2012 16:20:56 -0400
Derrick Brashear <shadow@gmail.com> wrote:

> > lastPacket/lastPing can't lower the mtu, though, as I understand it.
> > They can only raise it.
> 
> not per se. it's used to know what the last thing we actually were
> able to send was, and the downward tweak happens with the mtuout code.

I mean, nothing involving the sending/receiving or lack of receiving of
those packets specifically causes the tracked mtu to decrease. It's only
when we hit mtuout in CheckCall, which happens on general loss of
connectivity.

> > If I can try to draw out how this works / would work:
> >
> >  - "normally" every N seconds, we send out a padded DF ping a little
> >    larger than the known path MTU. If we get a response or an ICMP frag
> >    error, set the pmtu.
> >
> >  - After X seconds/packets of packet loss, we send out a padded DF ping
> >    smaller than the known path MTU. If we get a response or ICMP frag
> >    error, set the pmtu. If we don't get either after Y seconds, repeat
> >    with smaller packets.
> 
> codify "packet loss", because here's where it gets exciting.

I don't mean that that's simple, but we have to make that determination
any way; I'm treating it as a black box since I don't think it affects
the rest of this design.

But off the top of my head... I would assume a similar definition as
idledead, though, but only while in sending mode. If no "progress" has
been made (meaning lack of soft ACKs) in X seconds for a call, but the
pings to avoid RX_CALL_DEAD are still getting through, you flag the peer
somehow.

Or it could just be whenever a call dies due to timeout network issues;
there's probably not much point in trying to figure out the mtu during a
running call, since if the packets are getting dropped, there's nothing
we can do.

> > Currently the 'pings' are done as call events, which I think is
> > really adding to the complexity. If we could do this per-peer (as
> > has been suggested for the NATping stuff, too), I think it would
> > make this easier to follow and would reduce overhead.
> 
> well, the nat ping stuff has been restructured to be different, but
> yeah.

nat ping is still an event on a connection object, isn't it? It may only
be one conn per peer, but it's still attached to the conn.

> > Rx ping acks iirc need to be tied to a call, though; would it be
> > possible to use "version" packets for this again?
> 
> if we're sending them anyway, yes, but the goal there was to generate
> no extra traffic in the base case.

I have a hard time seeing either scenario being a problem:

 - Attempting to grow an unchanging the MTU should be really infrequent,
   so this should not impact net performance (possibly just turning off
   entirely, if nothing changes after a certain amount of time)
 
 - Attempting to shrink an MTU could be more frequent, but should only
   be done when call data isn't processing anyway, so we are not
   impacting existing activity

> we need a way to discover nat ping is unneeded and not send it, and
> marking such a thing in the peer and then doing this seems reasonable.

I didn't think this was possible in theory, since stateful firewalls can
exist even when the addresses look correct. Or you could theoretically
be NATd multiple times involving a public address or something.

-- 
Andrew Deason
adeason@sinenomine.net