[AFS3-std] DNS SRV Resource Records for AFS

Jeffrey Hutzelman jhutz@cmu.edu
Mon, 05 Oct 2009 21:12:10 -0400


--On Monday, October 05, 2009 04:53:48 PM -0700 Russ Allbery 
<rra@stanford.edu> wrote:

> Jeffrey and I were talking some more about this and there is another
> decision point: whether to re-randomize the server list by weight for
> every call or to only do that when the TTLs expire.  Currently, we only do
> that when the TTL expires, but the DNS SRV RFC prefers doing it for every
> call (although allows us to specify otherwise).  My inclination is to
> allow either for AFS, at least for the time being.  Per call is probably
> better in some sense, but I don't think it's sufficiently better to
> require implementations do it.

The ordered server list is an implementation detail, and while some 
discussion of it is appropriate, I don't think we need to specify an exact 
algorithm.  In fact, I think doing so will get pretty hairy, since it 
interacts with keeping track of down servers.

It's true that the spec gives us some leeway in specifying how to use the 
weight.  In particular, it specifies an algorithm to be used in ordering 
target hosts having the same priority, but is silent on the question of 
whether that ordering is to be recomputed for each transaction/whatever. 
It also makes the assumption that clients will contact each host _in 
order_, without prior information about which hosts are up, while in 
practice AFS clients often have considerable information about which 
servers are up.

I'd suggest it is probably appropriate for AFS clients to use the weighting 
algorithm described in RFC2782, but omit those servers which are known to 
be down (by whatever mechanism is used for that).  The random order should 
be reevaluated whenever the SRV data is refreshed, or if a server's up/down 
state changes.  Ideally, a client would randomly select a new server for 
each call, but performance considerations may dictate doing so less often.

I think we should do the following:

- REQUIRE that priorities be obeyed; a server with lower priority MUST
  be tried before any servers of higher priority, unless the former is
  known to be down.

- REQUIRE that clients use the weighting algorithm described in RFC2782
  to select among servers of equal priority.  However, this algorithm
  may be applied in any of three ways:
  (a) compute a complete randomly-ordered list of servers, then use
      that list to determine a server preference order, such that
      a server appearing earlier in the list will always be tried
      before any server appearing later in the list, unless the former
      is known to be down.
  (b) randomly select a single server each time a call is to be made
  (c) randomly select a single server on a periodic basis, with all
      calls made to the most-recently selected server unless that
      server goes down, in which case a new server is selected.

- If method (a) is used, the client MAY omit known-down servers from
  the list.  If it does, then the client MUST employ some mechanism
  for discovering recovery of a down server, and MUST recompute the
  server list when the up/down state of a server changes.

- If method (a) is used and the client does not omit known-down servers
  from the list, then it SHOULD employ some mechanism for tracking
  which servers are down and discovering recovery of a down server,
  in order to avoid repeated calls to a down server.  But maybe we
  don't need to say this, since failing to do so just makes that
  client's performance sad.

- If methods (b) or (c) are used, the client MUST omit known-down
  servers from the list, and employ some mechanism for discovering
  recovery of a down server, and MUST recompute the server list when
  the up/down state of a server changes.  This is just common sense;
  if you don't do this then random-selection after a failed call may
  just select the same server again.


One way to implement (a) without omitting down servers in current OpenAFS 
is to compute server preferences based on priority and weight, in a fashion 
similar to that described in the current draft.  Then the CM tries servers 
in order, but tracks down servers and doesn't make real calls to them.

Note that "recomputing the server list" when a server goes up and down 
doesn't have to include re-querying the SRV record, and in fact could 
simply mean keeping track of the current sum-of-weights and adjusting it 
whenever a server goes up or down.  The running sum described in RFC2782 
can be computed on the fly as an entry is selected, provided the total is 
known.


-- Jeff