[AFS3-std] DNS SRV Resource Records for AFS

Jeffrey Hutzelman jhutz@cmu.edu
Mon, 05 Oct 2009 21:31:52 -0400


--On Monday, October 05, 2009 09:09:06 PM -0400 David Boyes 
<dboyes@sinenomine.net> wrote:

>> I would agree that the interfaces provided by most platforms for
>> hostname-to-IP-address lookups, such as gethostbyname() or getaddrinfo(),
>> do provide some sort of caching.  Some systems even provide interfaces
>> which combine all of the operations needed to establish a TCP connection,
>> including hostname lookup, and these also often do some sort of caching.
>> But those interfaces aren't what we're interested in here, as they are
>> generally not useful for resolving other types of records, such as the
>> SRV and AFSDB records the present document discusses.
>
> Mmf. If you're digging that far down, I think there are bigger problems.

Not at all.  That's exactly the level we're talking about here -- the 
present document specifies use of a type of record which is used to convey 
information other than simple name-to-address mappings, and that generally 
means using an interface that lets you directly look up DNS RR's.  Such 
interfaces generally don't (and shouldn't) provide caching, aside from 
whatever intermediate caching server a machine might be configured to use.

>> Much more likely is that implementations will
>> somehow manage not to re-query when they should.
>
> Thus some question about whether that needs to be an application function
> at all.

I'm not sure what you mean.  I do not think it is realistic, in AFS, to do 
an upcall and a DNS query every time one wishes to make an RPC.  I do think 
it's terribly wasteful.


> I'm aware of what SHOULD means.

I suspected as much, but wanted the context to be clear for everyone.


>> I don't think we want to say clients SHOULD re-resolve SRV records on
>> every VLDB lookup; that's likely to be way more often than necessary and
>> places unnecessary burden on nameservers, unless you assume the
>> existence of a cache in the right place.  It's better not to
>> second-guess the DNS infrastructure, and instead simply REQUIRE that
>> clients obey the TTL provided them.
>
> But didn't you say earlier that you weren't worried about too many
> queries....?

I'm not worried that, given a reasonable spec, implementations will ignore 
it and go out of their way to query the DNS hundreds of times per second. 
In fact, if we wrote a spec that _said_ to do that, I can't imagine any 
sane implementation actually complying.

As it turns out, I'm also not terribly concerned that existing AFS 
implementations will fail to do a new query when the TTL on a record 
expires, since as far as I know all of them already do this correctly.  I 
would be concerned about a new implementation getting this wrong, because 
it's fairly easy to forget to handle it.


> I guess I think that tolerating the pre-existing clients in a form that
> recognizes that there are differing interpretations, capturing those
> differences, and that new implementations will need to deal with that is
> important enough to give them a break. Controlled evolution can take over
> from there, but reality seems to dictate that we will have a spectrum of
> clients and servers and that we need to recognize that fact in the
> documentation we produce.

There is a difference between documentation, which is descriptive, and 
protocol specifications, which are prescriptive.  I think we need to design 
protocols in which new clients interoperate with pre-existing ones.  I 
don't think we need to design new protocols with the goal that pre-existing 
implementations already comply, and I don't think that new protocol 
specifications must also act as documentation of existing behavior.

We are not writing a set of laws with which every AFS client and server, 
past and future, must comply.  There is no penalty for not complying, other 
than failure to interoperate.  Thus, it is no problem to define new 
protocols which require new or different behavior of implementations which 
choose to comply with them; in fact, there can be significant benefit in 
doing so.

A spec that says "use SRV" doesn't make all the existing clients that don't 
do so illegal.  A spec that says "use SRV, and obey the TTL's" doesn't 
somehow break existing clients that use AFSDB and ignore the TTL's.  Those 
clients, if they exist, are already broken, but the new document won't make 
them any more broken.


>>> best practice can be toward some kind of DNS caching on
>>> clients as well as servers.
>>
>> I don't believe specifying best practices for operation of the DNS is in
>> scope for this group.
>
> Hmph. Saying "Having a DNS cache available is beneficial to the operation
> of this application" doesn't strike me as specifying DNS operations.

Saying "query the DNS 100 times per second, and best practice is for 
everyone to run a DNS cache so this doesn't melt the nameservers" is 
specifying DNS operations, and IMHO is also totally unreasonable.  We 
should not be making assumptions about how people have deployed DNS, and we 
should not be telling people "you better deploy DNS our way, or else AFS is 
going to suck".  That won't get people to deploy DNS our way; it will get 
them to run NFSv4 instead.

-- Jeff