[OpenAFS-devel] Re: Path MTU discovery
Andrew Deason
adeason@sinenomine.net
Tue, 25 Sep 2012 16:17:14 -0500
On Tue, 25 Sep 2012 16:20:56 -0400
Derrick Brashear <shadow@gmail.com> wrote:
> > lastPacket/lastPing can't lower the mtu, though, as I understand it.
> > They can only raise it.
>
> not per se. it's used to know what the last thing we actually were
> able to send was, and the downward tweak happens with the mtuout code.
I mean, nothing involving the sending/receiving or lack of receiving of
those packets specifically causes the tracked mtu to decrease. It's only
when we hit mtuout in CheckCall, which happens on general loss of
connectivity.
> > If I can try to draw out how this works / would work:
> >
> > - "normally" every N seconds, we send out a padded DF ping a little
> > larger than the known path MTU. If we get a response or an ICMP frag
> > error, set the pmtu.
> >
> > - After X seconds/packets of packet loss, we send out a padded DF ping
> > smaller than the known path MTU. If we get a response or ICMP frag
> > error, set the pmtu. If we don't get either after Y seconds, repeat
> > with smaller packets.
>
> codify "packet loss", because here's where it gets exciting.
I don't mean that that's simple, but we have to make that determination
any way; I'm treating it as a black box since I don't think it affects
the rest of this design.
But off the top of my head... I would assume a similar definition as
idledead, though, but only while in sending mode. If no "progress" has
been made (meaning lack of soft ACKs) in X seconds for a call, but the
pings to avoid RX_CALL_DEAD are still getting through, you flag the peer
somehow.
Or it could just be whenever a call dies due to timeout network issues;
there's probably not much point in trying to figure out the mtu during a
running call, since if the packets are getting dropped, there's nothing
we can do.
> > Currently the 'pings' are done as call events, which I think is
> > really adding to the complexity. If we could do this per-peer (as
> > has been suggested for the NATping stuff, too), I think it would
> > make this easier to follow and would reduce overhead.
>
> well, the nat ping stuff has been restructured to be different, but
> yeah.
nat ping is still an event on a connection object, isn't it? It may only
be one conn per peer, but it's still attached to the conn.
> > Rx ping acks iirc need to be tied to a call, though; would it be
> > possible to use "version" packets for this again?
>
> if we're sending them anyway, yes, but the goal there was to generate
> no extra traffic in the base case.
I have a hard time seeing either scenario being a problem:
- Attempting to grow an unchanging the MTU should be really infrequent,
so this should not impact net performance (possibly just turning off
entirely, if nothing changes after a certain amount of time)
- Attempting to shrink an MTU could be more frequent, but should only
be done when call data isn't processing anyway, so we are not
impacting existing activity
> we need a way to discover nat ping is unneeded and not send it, and
> marking such a thing in the peer and then doing this seems reasonable.
I didn't think this was possible in theory, since stateful firewalls can
exist even when the addresses look correct. Or you could theoretically
be NATd multiple times involving a public address or something.
--
Andrew Deason
adeason@sinenomine.net