[OpenAFS] Re: DB servers "quorum" and OpenAFS tools

Simon Wilkinson simonxwilkinson@gmail.com
Fri, 24 Jan 2014 08:01:07 +0000


On 24 Jan 2014, at 07:48, Harald Barth <haba@kth.se> wrote:

> You are completely right if one must talk to that server. But I think
> that AFS/RX sometimes hangs to loooooong on waiting for one server
> instead of trying the next one. For example for questions that could
> be answered by any VLDB. I'm thinking of operation like group
> membership and volume location.

I have long thought that we should be using multi for vldb lookups, specific=
ally to avoid the problems with down database servers. The problem is that d=
oing so may cause issues for sites that have multiple dbservers for scalabil=
ity, rather than redundancy. Instead of each dbserver seeing a third (or a q=
uarter, or ...) of requests it will see them all. Even if the client aborts t=
he remaining calls when it receives the first response, the likelihood is th=
at the other servers will already have received, and responded to, the reque=
st.

There are ways we could be more intelligent (for example measuring the norma=
l RTT of an RPC to the current server, and only doing a multi if that is suc=
ceeded) But we would have to be very careful that this wouldn't amplify a co=
ngestive collapse.

Cheers,

Simon=