[OpenAFS] Re: DB servers "quorum" and OpenAFS tools

Thu, 23 Jan 2014 14:58:35 +0000

> [ ... ] adding the new machines to the CellServDB before the
> new server is up. You could bring up e.g. dbserver 4, and only
> after you're sure it's up and available, then add it to the
> client CellServDB. Then remove dbserver #3 from the client
> CellServDB, and then turn off dbserver #3.

For the client 'CellServDB' I simply did not expect any issues:
my expectation was that the clients would scan very quickly the
list of those addresses, starting with the lowest numbered for
example, and finding a live one member of "quorum", and then if
ncessary getting from it the address of the sync site; which is
close to what it seems to do, only very slowly.

I would have wished to put all 6 (different) IP addresses (3 up,
3 down) in the client 'CellServDB' and in 'fs newcell' to
minimize the number of times I would do updates, but I could not
because of a local configuration management system that puts the
same list in the client and server 'CellServDB'. But done
manually on a test client seemed to work fine, except for the
'vos' clients and their very long search timeouts.

My real issue was 'server/CellServeDB' because we could not
prepare ahead of time all 3 new servers, but only one at a time.

The issue is that with 'server/CellServDB' update there is
potentially a DB daemon (PT, VL) restart (even if the rekeying
instructions hint that when the mtime of 'server/CellServDB'
changes the DB daemons reread it) and in any case a sync site
election.

Because each election causes a "blip" with the client I would
rather change the 'server/CellServDB' by putting in extra
entries ahead of time or leaving in entries for disabled
servers, to reduce the number of times elections are triggered.
Otherwise I can only update one server per week...

Ideally if I want to reshape the cell from DB servers 1, 2, 3 to
4, 5, 6, I'd love to be able to do it by first putting in the
'server/CellServDB' all 6 with 4, 5, 6 not yet available, and
only at the end remove 1, 2, 3. What does not play well (if one
of the 3 live servers fails) with the "quorum" :-) so went
halfway.

> You would need to keep the server-side CellServDB accurate on
> the dbservers in order for them to work, but the client
> CellServDB files can be missing dbservers. [ ... ]

It would be nice to know more about the details here to make
planning easier in future updates.

For example in an ideal world putting more or less DB servers in
the client 'CellServDB' should not matter, as long as one that
belongs to the cell is up; again if the logic were for all types
of client: "scan quickly the list of potential DB servers, find
one that is up and belongs to the cell and reckons is part of
the quorum, and if necessary get from it the address of the sync
site".

Similarly (within limits) deliberately having non-up DB servers
to the 'server/CellServDB' should not matter that much, because
non-up DB servers happen anyhow in case of failures.