[OpenAFS] Replicating the AFS Administrative Databases?
Jeffrey Hutzelman
jhutz@cmu.edu
Tue, 27 Sep 2005 15:11:49 -0400
On Monday, September 19, 2005 09:45:44 PM -0400 Robert Banz <banz@umbc.edu>
wrote:
>
> Ok, here's the clarification:
>
> A machine can be a database server or a fileserver, or both.
True, but the documentation that was quoted makes the assumption that all
dbservers are also fileservers. So, when they say "if you only have one
fileserver it must be a dbserver", what they really mean is "you must have
at least one dbserver; it doesn't matter whether it's also a fileserver".
> You have to have at least one machine providing database service.
True.
> It is preferrable that you have multiple machines -- either 3 or 5 --
> 3 is usually sufficient.
True. Larger numbers of dbservers are possible, but there's no advantage
to having them unless you have a very unusual server.
> It's important that they be an odd number of
> machines, as in the case of a server failure you need a quorum of servers
> still talking to each other to sort out who's database is writable.
False. It is possible to form a quorum with an even number of servers; you
simply need to have more than half the total number of votes. Note that
the server with the lowest IP address gets an extra half vote when voting
for itself; as a result, you can get a quorum even with exactly half of an
even number of servers, as long as the lowest-numbered server is one of
them.
Note that the IBM documentation goes on to recommend against having exactly
two database servers. This advice is left over from the days of afs 3.3
and earlier, when a second database server bought you basically no benefit.
The issue is that with exactly two servers, the lower-numbered server can
form a quorum on its own, and the higher-numbered server cannot. Thus, the
lower-numbered server is both necessary and sufficient for formation of a
quorum, and the second server is superfluous.
Now, in those days, a dbserver that was not a member of a quorum could not
even provide read-only access to the database. So, without a quorum, you
had no AFS service at all. Eventually the conclusion was reached that, for
the AFS databases, stale data was better than none at all, and both the
vlserver and ptserver were modified to provide read-only service when not
part of a quorum. The result is that today, that second server keeps your
cell basically up and running while you repair the first one.
It's worth noting that even in the old days, a second server was still
valuable as a backup copy of the data. Losing the PRDB is a major pain in
the ass, and was even worse in those days, before we had tools like pt_util.
> You may choose, as I said, to run the database service on machines that
> are also fileservers. I would recommend, however, that you run separate
> database server machines.
Agree, unless the cell is _very_ small. If you only have a couple of
fileservers, you probably don't need separate dbservers.
-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
Sr. Research Systems Programmer
School of Computer Science - Research Computing Facility
Carnegie Mellon University - Pittsburgh, PA