[AFS3-std] DNS SRV Resource Records for AFS
Jeffrey Hutzelman
jhutz@cmu.edu
Mon, 05 Oct 2009 21:12:10 -0400
--On Monday, October 05, 2009 04:53:48 PM -0700 Russ Allbery
<rra@stanford.edu> wrote:
> Jeffrey and I were talking some more about this and there is another
> decision point: whether to re-randomize the server list by weight for
> every call or to only do that when the TTLs expire. Currently, we only do
> that when the TTL expires, but the DNS SRV RFC prefers doing it for every
> call (although allows us to specify otherwise). My inclination is to
> allow either for AFS, at least for the time being. Per call is probably
> better in some sense, but I don't think it's sufficiently better to
> require implementations do it.
The ordered server list is an implementation detail, and while some
discussion of it is appropriate, I don't think we need to specify an exact
algorithm. In fact, I think doing so will get pretty hairy, since it
interacts with keeping track of down servers.
It's true that the spec gives us some leeway in specifying how to use the
weight. In particular, it specifies an algorithm to be used in ordering
target hosts having the same priority, but is silent on the question of
whether that ordering is to be recomputed for each transaction/whatever.
It also makes the assumption that clients will contact each host _in
order_, without prior information about which hosts are up, while in
practice AFS clients often have considerable information about which
servers are up.
I'd suggest it is probably appropriate for AFS clients to use the weighting
algorithm described in RFC2782, but omit those servers which are known to
be down (by whatever mechanism is used for that). The random order should
be reevaluated whenever the SRV data is refreshed, or if a server's up/down
state changes. Ideally, a client would randomly select a new server for
each call, but performance considerations may dictate doing so less often.
I think we should do the following:
- REQUIRE that priorities be obeyed; a server with lower priority MUST
be tried before any servers of higher priority, unless the former is
known to be down.
- REQUIRE that clients use the weighting algorithm described in RFC2782
to select among servers of equal priority. However, this algorithm
may be applied in any of three ways:
(a) compute a complete randomly-ordered list of servers, then use
that list to determine a server preference order, such that
a server appearing earlier in the list will always be tried
before any server appearing later in the list, unless the former
is known to be down.
(b) randomly select a single server each time a call is to be made
(c) randomly select a single server on a periodic basis, with all
calls made to the most-recently selected server unless that
server goes down, in which case a new server is selected.
- If method (a) is used, the client MAY omit known-down servers from
the list. If it does, then the client MUST employ some mechanism
for discovering recovery of a down server, and MUST recompute the
server list when the up/down state of a server changes.
- If method (a) is used and the client does not omit known-down servers
from the list, then it SHOULD employ some mechanism for tracking
which servers are down and discovering recovery of a down server,
in order to avoid repeated calls to a down server. But maybe we
don't need to say this, since failing to do so just makes that
client's performance sad.
- If methods (b) or (c) are used, the client MUST omit known-down
servers from the list, and employ some mechanism for discovering
recovery of a down server, and MUST recompute the server list when
the up/down state of a server changes. This is just common sense;
if you don't do this then random-selection after a failed call may
just select the same server again.
One way to implement (a) without omitting down servers in current OpenAFS
is to compute server preferences based on priority and weight, in a fashion
similar to that described in the current draft. Then the CM tries servers
in order, but tracks down servers and doesn't make real calls to them.
Note that "recomputing the server list" when a server goes up and down
doesn't have to include re-querying the SRV record, and in fact could
simply mean keeping track of the current sum-of-weights and adjusting it
whenever a server goes up or down. The running sum described in RFC2782
can be computed on the fly as an entry is selected, provided the total is
known.
-- Jeff