[OpenAFS-devel] fileserver -> client NAT ping

Andrew Deason adeason@sinenomine.net
Wed, 14 Aug 2013 13:09:23 -0500


During the release-team meeting today, we talked a little about
<http://gerrit.openafs.org/#change,9420>. That change was proposed by me
to turn on the "NAT ping" in the opposite direction in addition to what
we do now; fileservers ping clients, with the goal of improving
communication in NAT environments when the clients don't do NAT ping.

jaltman mentioned that this may not help, since many NATs don't care if
you ping them only in one direction. We're going to try and get some
testing on a real site, but I also had an alternative idea. That made me
think if we could get bidirectional traffic out of such clients, to keep
the port mapping alive. Of course, we can send an rx version query, but
sending that to every client connected to a fileserver generates
nontrivial traffic, and loads every client.

So can we send it to only NAT clients? We can't detect NATed clients
with perfect accuracy, but I think we can make a guess, with little
chance for false positives, since we have the alleged local IPs for the
client from TellMeAboutYourself.

I was thinking of applying the following heuristic. If both of the
following are true:

 - Every address in the TMAY response for a client is an RFC1918
   private address (or a localhost-y address; we already ignore those)
 - The actual IP we're sending to is _not_ an RFC1918 private address

Then it seems likely that the host is behind a NAT, and we can ping it
with version requests. That obviously doesn't catch everything, since a
NAT may not be using (only) RFC1918 addresses. And false positives are
possible, if a host reports a private address in TMAY, but has actually
moved to a public address.

The false positive case seems rare, and the only consquence are some
extra packets. The false negative case seems more common, but at least
we are helping some cases. This also seems most likely to detect the
NATs involving low-quality consumer routers that have really low port
mapping timeouts, for cells accessed by random people across the
internet, which seems to be where this is most likely to be a problem.

Any thoughts?

-- 
Andrew Deason
adeason@sinenomine.net