[OpenAFS] AFS lag

Ken Hornstein kenh@cmf.nrl.navy.mil
Wed, 18 Mar 2009 21:56:49 -0400


>I'm no ubik engineer, but as far as I understand it, the protocol was not
>designed for even numbers of participating servers. For best results, three
>or five servers seem to be optimum.

There is a lot of misinformation about Ubik out there; the voting
protocol is actually not complicated, it's just not documented well.
Looking at the source code is even more confusing.

So, let me clear up some misconceptions:

- It's not really that an odd number is optimum; it's just that you're wasting
  a server with an even number.

  Why?  Well, the Ubik voting requires a majority number of servers to win an
  election; if there are 4 servers and only two are available, then that's
  not a majority.  So with 4 servers, you can lose only one server and still
  maintain quorum (same as with three).  You need 5 servers to be able to
  lose two of them.

  Now, there is an extra wrinkle here ... the "best" server (lowest
  numbered) gets an extra vote.  So in a 4 server configuration,
  you can actually lose two and maintain quorum ... as long as one
  of the two isn't the "best" server.  But with five servers, you
  can lose ANY two.  But the protocol works fine with two, or three, or
  four, or five.  There is NO magic here.

>What I definitely whitnessed is that servers in a cell configured with two
>servers take more than a minute to elect a sync site after server restarts.
>Three servers are supposed to make it in an instant.

This is one of those mostly-not-true statements that has a bit of truth in
it.  The exact details:

- When brought up, a database server will not vote YES for anyone for 75
  seconds.  This is inviolate.  It doesn't matter if there are two,
  three, or 100 database servers.  If you bring up all your servers
  cold, at the same time, it will take at least 75 seconds for a
  quorum election.

- If you have two database servers and you only restart the "best" server
  (note: in a two database server cell, only the "best" server can
  ever be elected as master), a new election will take 75 seconds.
  Why?  Because you have to wait for the best server to be able to
  vote for itself; without that vote, there is not a majority.
 
- If you have three (or more) database servers and you only restart the
  current master, a successful election will happen almost instantly.
  Why?  Because all of the servers that are still up will still vote
  YES for the master; the master's own YES vote is not necessary.  But
  note this only applies if all of the other servers are still running.
  If, for example, you rebooted the master and if it took longer than
  75 seconds for the master to restart, then what will likely happen is
  a new master will be elected.

Getting back to the original poster's question ... by far the most common
problem I have seen with Ubik is bad time synchronization.  All of your
database servers must be synched up time-wise (the protocol depends on
timestamps).  It doesn't need to be femtosecond accuracy; the protocol
defines MAXSKEW as 10 seconds.

If your database servers are accessable via the Internet, we could take
a look at them via udebug.  Really, there are only a few things that can
go wrong; of all of the pieces of AFS, I think Ubik is one of the most
bulletproof.

--Ken