[OpenAFS] Procedure for adding new DBs

Jeffrey Hutzelman jhutz@cmu.edu
Wed, 08 Feb 2006 17:59:30 -0500

On Thursday, February 09, 2006 10:45:27 AM +1300 Matthew Cocker 
<matt@cs.auckland.ac.nz> wrote:

> Hi
> Our AFS cell has recently expanded into a university wide storage
> service. As part of this expansion I need to add some new DB servers
> that will have lower ip addresses than the existing DBs.
> Would the following work
> i) bring up new dbs but do not restart db processes on other DBs so
> existing coordinator maintains role (will this work?)

No.  All dbservers must have the same server list, and you must not start 
ptserver or vlserver processes on machines not listed in that list.  It 
doesn't matter what the clients think, but if the servers do not all agree 
you will have election problems, and depending on the way in which they 
disagree, bad things could happen.

If you are going to add a new dbserver, you should distribute the new 
server-side CellServDB to all of your existing dbservers first, and make 
sure the ptserver and vlserver processes have been restarted to pick up the 
change.  I recommend restarting one server at a time, and waiting for it to 
begin voting again, so as to avoid losing quorum, but this is not required.

Once you have updated all of your existing servers, you can start the new 
dbserver, and it will join the quorum, obtain an updated database, and 
begin providing service.  Note that this will not force the new server to 
become coordinator -- that only happens if the active coordinator fails 
(which is fine; you don't really care which server is coordinator).

If you have multiple new dbservers, repeat the process more than once.  You 
can add multiple new servers at once, but be aware that more than half of 
the servers listed in the configuration must be up in order to elect a 
coordinator, so if you have three existing servers and are adding three new 
ones, doing them all at once might not be such a great idea.

Note that the configuration of fileservers and clients is more or less 
irrelevant, except that any client whose configuration does not include the 
_current_ coordinator will be unable to make changes.  To avoid this, you 
can distribute configuration to clients listing all of the new servers 
before beginning any updates (client configurations don't affect elections, 
so you can add as many servers to client configuration at once as you 
want).  If you are removing servers, leave them in the client config until 
you've done all the changes, then distribute another client config change.

> Some questions,
> i) once I alter all the servers via bos addhost command do I need to
> restart the fs/db processes to get the servers to use the new settings?

You need to restart the ptserver and vlserver, and also the kaserver and 
buserver, if you are using those, in order to pick up the change.  Note 
that it is necessary for the _running_ configuration on all dbservers to 
agree for elections to work correctly.  However, it is safe to have a 
server that has only been added/removed on some machines, provided that 
server is not actually up.  Note that for elections to succeed, the number 
of partially-added servers must not be so high that the servers which are 
actually up are not sufficient to satisfy the one-more-than-half rule.  For 
example, if you have three servers and are adding a fourth, it is safe to 
add the fourth server to the server-side CellServDB and restart the 
existing servers one at a time, provided you do not bring up the new server 
until you've updated all the existing ones.

Once you have added all of your new dbservers, you will want to get around 
to restarting the fileservers, so they know about the changes.  Exactly 
when you do this is not critical, as long as you don't inadvertently retire 
all of the dbservers that a fileserver knows about.  Note that the 
fileserver registers itself in the vldb on startup, and this process will 
fail if the current coordinator is not listed in that fileserver's 
CellServDB.  Once startup is complete, the fileserver does not perform any 
operations which require talking to the coordinator.

> ii) can the server processes use DNS to get database servers like the
> client can?

It probably can, if the server-side CellServDB is empty, but this is not a 
good idea.  The election process depends on all dbservers agreeing on the 
set of available servers.  Using the DNS as the source for this data makes 
it too easy to have inconsistencies, and too difficult to control exactly 
when each server picks up a change.

> iii) Is fs newcell sufficient to get linux client to use the new servers

Yes.  Note that fs newcell _replaces_ the client's idea of the set of 
dbservers for that cell, so you will need to list all of the existing 
servers in addition to the one you are adding.  A restart is not required.

-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
   Sr. Research Systems Programmer
   School of Computer Science - Research Computing Facility
   Carnegie Mellon University - Pittsburgh, PA