[OpenAFS] Re: DB servers "quorum" and OpenAFS tools

Jeffrey Hutzelman jhutz@cmu.edu
Fri, 17 Jan 2014 17:56:11 -0500

On Fri, 2014-01-17 at 14:21 -0600, Andrew Deason wrote:
> On Fri, 17 Jan 2014 18:50:13 +0000
> pg@afs.list.sabi.co.UK (Peter Grandi) wrote:
> > Planned to do this incremental by adding a new DB server to the
> > 'CellServDB', then starting it up, then removing the an old DB
> > server, and so on until all 3 have been replaced in turn with
> > new DB servers #4, #5, #6.
> > 
> > At some point during this slow incremental plan there were 4
> > entries in both 'CellServDB's and the new one had not been
> > started up yet, and would not be for a couple days.
> Oh also, I'm not sure why you're adding the new machines to the
> CellServDB before the new server is up. You could bring up e.g. dbserver
> #4, and only after you're sure it's up and available, then add it to the
> client CellServDB. Then remove dbserver #3 from the client CellServDB,
> and then turn off dbserver #3.

Yup; that's the sane thing to do.  New servers should be in service
before you publish them in AFSDB or SRV records or in clients'
CellServDB files, and old servers should not be removed from service
until after they have been unpublished and all the clients you care
about have picked up the change.

> You would need to keep the server-side CellServDB accurate on the
> dbservers in order for them to work, but the client CellServDB files can
> be missing dbservers. This won't work if a client needs the sync-site,
> and the sync-site is missing from the CellServDB, but in all other
> situations, that should work fine.

This is what gerrit #2287 is about.  It adds a switch that will allow
you to configure your dbservers so that they will not be elected
coordinator.  Unpublished servers should be run with this switch, or
configured as non-voting servers, so that they don't become sync site.

Unfortunately, progress on getting that merged has been stalled for a
while, in no small part because there are changes still needed and a
related patch required significant rework, and I haven't had time to
touch this stuff in a few months.  So in the meantime, the best you can
do is insure the unpublished server will not become sync site by some
combination of careful selection of the IP addresses involved, careful
monitoring and management of the election process, and/or marking the
unpublished server as nonvoting.  Some care is required for nonvoting
servers, as in theory all dbservers must agree on who the voting servers
are.  Some mismatches are possible and even "safe", but figuring out
which those are and what the behavior will be requires a thorough
understanding of what checks are done and how the voting process works.

-- Jeff