[OpenAFS] Re: UDP timeouts

Fri, 6 May 2011 15:02:47 -0500

On Fri, 06 May 2011 13:10:33 +0200
Jaap Winius <jwinius@umrk.nl> wrote:

> Quoting Jeffrey Altman <jaltman@secure-endpoints.com>:
> 
> > 10 to 15 minutes is more than sufficient.
> 
> Since ip_conntrack_udp_timeout and ip_conntrack_udp_timeout_stream  
> were decreased from 28800 to 900 seconds, I've been seeing lots of  
> dropped packets again. Any explanations? I've now increased both  
> values to 3600.

I think 900 is actually too low, but 1200 or 1500 may be more
appropriate.

I don't think the interval we need to be wary of is the checkservers
interval on the clients, but rather the client check time on the server
('checktime'), which is currently always 15 minutes. Since (at least
speaking for the unix client), we only issue checkserver GetTime/GetCaps
probes every 10 minutes if we have callbacks for that server (or a few
other situations). After that, we won't ping the server anymore, but the
fileserver still has the client's associated host structure in memory,
and only pings the client every 15 minutes.

So, if the mapping expires and the client contacts the fileserver again,
the fileserver will get a connection on a new port. But during the
connection negotation it sees that the client's UUID already exists in
memory for the old NAT port. So it probes the old address/port to see if
the "old" host is still alive and has the same UUID, which is contacting
the expired NAT entry, so you see dropped packets.

This server->client "ping" I say happens every 15 minutes, but
technically speaking what the server does is check all clients every 5
minutes, and pings those that haven't been heard from in the last 15
minutes. So depending on how the timing works out, it can be up to about
20 minutes between adjacent pings. And since we check hosts in serial,
it can actually be a bit more than that, depending on how many hosts you
have, how many drop off the net, how long they take to respond, etc.

-- 
Andrew Deason
adeason@sinenomine.net