[OpenAFS] Re: DB servers "quorum" and OpenAFS tools

Jeffrey Hutzelman jhutz@cmu.edu
Thu, 23 Jan 2014 14:33:58 -0500

On Thu, 2014-01-23 at 10:44 -0600, Andrew Deason wrote:

> > For example in an ideal world putting more or less DB servers in
> > the client 'CellServDB' should not matter, as long as one that
> > belongs to the cell is up; again if the logic were for all types
> > of client: "scan quickly the list of potential DB servers, find
> > one that is up and belongs to the cell and reckons is part of
> > the quorum, and if necessary get from it the address of the sync
> > site".

The problem is that you the client to scan "quickly" to find a server
that is up, but because networks are not perfectly reliable and drop
packets all the time, it cannot know that a server is not up until that
server has failed to respond to multiple retransmissions of the request.
Those retransmissions cannot be sent "quickly"; in fact, they _must_ be
sent with exponentially-increasing backoff times.  Otherwise, when your
network becomes congested, the retransmission of dropped packets will
act as a runaway positive feedback loop, making the congestion worse and
saturating the network.

-- Jeff