[OpenAFS] Re: DB servers "quorum" and OpenAFS tools

Andrew Deason adeason@sinenomine.net
Thu, 23 Jan 2014 10:44:52 -0600

On Thu, 23 Jan 2014 14:58:35 +0000
pg@afs.list.sabi.co.UK (Peter Grandi) wrote:

> The issue is that with 'server/CellServDB' update there is
> potentially a DB daemon (PT, VL) restart (even if the rekeying
> instructions hint that when the mtime of 'server/CellServDB'
> changes the DB daemons reread it) and in any case a sync site
> election.

The daemons do reread the local configuration if the CellServDB mtime
changes. But they don't reinitialize the voting algorithm data and rx
connections etc that would be required to incorporate a new dbserver
into the quorum. So, for that you need to restart, yes.

> > You would need to keep the server-side CellServDB accurate on
> > the dbservers in order for them to work, but the client
> > CellServDB files can be missing dbservers. [ ... ]
> It would be nice to know more about the details here to make
> planning easier in future updates.

I'm not sure what additional details you want. You just always make sure
the client CellServDB doesn't refer to dbservers that don't exist. So,
when you add a new dbserver, don't add it to the client CellServDB until
it's up and running. And when you remove a dbserver, remove it from the
client CellServDB before decommissioning it.

> For example in an ideal world putting more or less DB servers in
> the client 'CellServDB' should not matter, as long as one that
> belongs to the cell is up; again if the logic were for all types
> of client: "scan quickly the list of potential DB servers, find
> one that is up and belongs to the cell and reckons is part of
> the quorum, and if necessary get from it the address of the sync
> site".

There is an idea we had pending for performing a VL_ProbeServer multi_rx
call on 'vos' startup to see which servers are up before doing anything.
The possible argument against this is that it adds a little bit of load
and a little bit of delay on every operation, even if all of the servers
are up. But maybe it's worth it.

Another possible optimization that can be made is that ubik-using
utilities could try the lowest-ip dbserver first when doing something
that requires db write access (or just randomly pick a site from the
lowest "half+1" of the quorum), which would speed up the process in a
majority of cases. The argument against that, of course, is that the
"lowest IP" heuristic may not always apply in future implementations of
ubik, and in general it can make the minority of cases worse (when the
lower IPs are unreachable).

Andrew Deason