[AFS3-std] DNS SRV Resource Records for AFS

Mon, 05 Oct 2009 19:58:38 -0400

--On Monday, October 05, 2009 07:31:27 PM -0400 David Boyes 
<dboyes@sinenomine.net> wrote:

> One thought I had was that most TCP stack implementations these days
> provide some form of trivial DNS caching

I've found that most TCP stacks implement TCP, not DNS.  DNS lookups are 
usually handled in some piece of user-mode code, probably a library, which 
may vary from one application to the next.  And in fact, most interfaces 
I've seen for performing DNS queries do _not_ do extraneous caching.

I would agree that the interfaces provided by most platforms for 
hostname-to-IP-address lookups, such as gethostbyname() or getaddrinfo(), 
do provide some sort of caching.  Some systems even provide interfaces 
which combine all of the operations needed to establish a TCP connection, 
including hostname lookup, and these also often do some sort of caching. 
But those interfaces aren't what we're interested in here, as they are 
generally not useful for resolving other types of records, such as the SRV 
and AFSDB records the present document discusses.

>, so the impact of lots of DNS
> lookups may be less than we think

I don't think any of us are terribly concerned that something might be 
doing _too many_ lookups.  Much more likely is that implementations will 
somehow manage not to re-query when they should.

> Another thought would be to do a SHOULD/MUST division -- recommend that
> clients SHOULD re-resolve the name each time it looks up the VLDB or other
> component, but MUST resolve it and select it once at minimum.

Beware.  SHOULD is actually very strong, and usually doesn't mean what 
people think it means.  In particular, it doesn't mean "we think this is a 
good idea"; it means "you have to do this unless there's a valid reason not 
to and you fully understand the implications".

I don't think we want to say clients SHOULD re-resolve SRV records on every 
VLDB lookup; that's likely to be way more often than necessary and places 
unnecessary burden on nameservers, unless you assume the existence of a 
cache in the right place.  It's better not to second-guess the DNS 
infrastructure, and instead simply REQUIRE that clients obey the TTL 
provided them.

> That way old
> clients are still in compliance

I'm not terribly in favor of watering down new specifications so that 
preexisting implementations can claim to be in compliance.  New specs 
should be written to describe correct behavior, not redefine incorrect 
behavior as correct.

Note that this is different from designing new protocols or protocol 
versions such that new implementations will interoperate with old ones.

> best practice can be toward some kind of DNS caching on
> clients as well as servers.

I don't believe specifying best practices for operation of the DNS is in 
scope for this group.

> Ultimately, there's some interesting things that can be done with load
> balancing if TTLs are obeyed, thus the question.

There are, but mostly for services that can't do any better.  AFS already 
supports failover and simplistic load-balancing of database services, and 
the present document improves the situation by specifying support for SRV 
records, which allow for more complex load-balancing policies.  I don't 
believe DNS-based load-balancing tricks are necessary for AFS.

However, there is much to be said for being able to lower the TTL on an RR 
and then expect that clients will notice a change to that record within one 
TTL of when the data changes.  This ability is critical if one wishes to 
move, upgrade, renumber, or replace servers without spending long periods 
of time tracking down clients which haven't picked up the change.

-- Jeff