[OpenAFS] DB servers "quorum" and OpenAFS tools

Peter Grandi pg@afs.list.sabi.co.UK
Fri, 17 Jan 2014 18:50:13 +0000

A situation described below prompts a rather important general

  What rules do the OpenAFS tools use to contact one of
  the DB servers?

Because they seem different from those used by the cache
clients, and I have not found where they are documented.
There are (some) notes on how the DB servers handle "down"
sibling, and how the cache clients do.

  I have a single-host test OpenAFS cell with, and I
  have added a second IP address to '/etc/openafs/CellServDB'
  with an existing DNS entry (just to be sure) but not assigned
  to any machine: sometimes 'vos vldb' hangs for a while (105
  seconds), doing 8 attempts to connect to the "down" DB server;
  sometimes it connects to the "up" server and returns
  instantly. The wait times after the 8 attempts are:
  3.6s, 6.8s, 13.2s, 21.4s, 4.6s, 25.4s, 26.2s, 3.8s. 

The worry I have is that the OpenAFS tools handle "down" DB
servers less resiliently than the DB servers and the cache
clients, as the situation described below seems to suggest,
and this can have dismaying consequences.

Context: "typical" cell with 3 DB servers, a few fileservers,
and a few clients, with one of them doing backups in various
ways, typically 'vos dump -clone'; each of > 100 (usually
largish, dozens to hundreds of GiB) AFS-volumes incrementally
dumped every day, so a 'vos dump -clone' every 10-15 minutes.

Upgrading the cell servers from 1.4 to 1.6 (Debian), with all 3
DB servers #1, #2, #3 being replaced by servers with new OS and
importantly new Ip addresses.

Planned to do this incremental by adding a new DB server to the
'CellServDB', then starting it up, then removing the an old DB
server, and so on until all 3 have been replaced in turn with
new DB servers #4, #5, #6.

At some point during this slow incremental plan there were 4
entries in both 'CellServDB's and the new one had not been
started up yet, and would not be for a couple days.

The OpenAFS client caches seemed to cope well as expected, as in
a cell with a "quorum" of 3 "up" DB servers, and 1 "down". I
think the only consequence I noticed was sometimes 'aklog'
taking around 15 seconds.

However *some* backups started to hang and some AFS-volumes
became unaccessible to all clients. The fairly obvious cause was
that the cloning transaction instead of being very quick would
not end, and cloning locks the AFS-volume.

An 'strace' of the relevant 'vos' instances would show repeated
(for a very long time) attempts to contact the 1 "down" DB
server. Some of the instances of 'vos dump -clone' seemed to
contact one of the 3 "up" DB servers and had no issues.

The backups server regrettably has a 1.4.7 client cache package
(soon to be upgraded to 1.6.x). Perhaps newer packages have some
different logic, but it seemed as if 'vos' would choose at
random an entry from '/etc/openafs/CellServDB' and then stick
with it even if it did not respond to a connection attempt.
With a curious attempt to open "$HOME/.AFSSERVER" (which did not
exist). the 'vos' also tries to open "/.AFSSERVER".