[OpenAFS] Re: DB servers "quorum" and OpenAFS tools

Harald Barth haba@kth.se
Fri, 24 Jan 2014 08:48:39 +0100 (CET)

> The problem is that you the client to scan "quickly" to find a server
> that is up, but because networks are not perfectly reliable and drop
> packets all the time, it cannot know that a server is not up until that
> server has failed to respond to multiple retransmissions of the request.
> Those retransmissions cannot be sent "quickly"; in fact, they _must_ be
> sent with exponentially-increasing backoff times.  Otherwise, when your
> network becomes congested, the retransmission of dropped packets will
> act as a runaway positive feedback loop, making the congestion worse and
> saturating the network.

You are completely right if one must talk to that server. But I think
that AFS/RX sometimes hangs to loooooong on waiting for one server
instead of trying the next one. For example for questions that could
be answered by any VLDB. I'm thinking of operation like group
membership and volume location.