OpenAFS Master Repository branch, openafs-stable-1_6_x, updated. openafs-stable-1_6_18_3-11-g3f29b25
Gerrit Code Review
gerrit@openafs.org
Thu, 15 Sep 2016 05:42:18 -0400
The following commit has been merged in the openafs-stable-1_6_x branch:
commit 3f29b253bcda761305b5567b3936b38f797e7848
Author: Andrew Deason <adeason@dson.org>
Date: Sun May 1 11:24:30 2016 -0500
ubik: Don't RECFOUNDDB if can't contact most sites
Currently, the ubik recovery code will always set UBIK_RECFOUNDDB
during recovery, after asking all other sites for their dbversions.
This happens regardless of how many sites we were actually able to
successfully contact, even if we couldn't contact any of them.
This can cause problems when we are unable to contact a majority of
sites with DISK_GetVersion. Since, if we haven't contacted a majority
of sites, we cannot say with confidence that we know what the best db
version available is (which is what UBIK_RECFOUNDDB represents; that
we've found which database is the one we should be using). This can
also result in UBIK_RECHAVEDB in a similar situation, indicating that
we have the best db version locally, even though we never actually
asked anyone else what their db version was.
For example, say site A is the sync site going through recovery, and
DISK_GetVersion fails for the only other sites B and C. Site A will
then set UBIK_RECFOUNDDB, and will claim that site A has the best db
version available (UBIK_RECHAVEDB). This allows site A to process ubik
write transactions (causing the db to be labelled with a new epoch),
or possibly to send the db to the other sites via DISK_SendFile, if
they quickly become available during recovery. Ubik write transactions
can succeed in this situation, because our ContactQuorum_* calls will
succeed if we never try to contact a remote site ('rcode' defaults to
0).
This situation should be rather rare, because normally a majority of
sites must be reachable by site A for site A to be voted the sync site
in the first place. However, it is possible for site A to lose
connectivity to all other sites immediately after sync site election.
It is also possible for site A to proceed far enough in the recovery
process to set UBIK_RECHAVEDB before it loses its sync site status.
As a result of all of this, if a site with an old database comes
online and there are network connectivity problems between the other
sites and a ubik write request comes in, it's possible for the "old"
database to overwrite the "new" database. This makes it look as if the
database has "rolled back" to an earlier version.
This should be possible with any ubik database, though how to actually
trigger this bug can change due to different ubik servers setting
different network timeouts. It is probably the most likely with the
VLDB, because the VLDB is typically the most frequently written
database.
If a VLDB reverts to an earlier version, it can result in existing
volumes to appear to not exist in the VLDB, and can result in new
volumes re-using volume IDs from existing volumes. This can result in
rather confusing errors.
To fix this, ensure that we have contacted a majority of sites with
DISK_GetVersion before indicating that we have located the best db
version. If we've contacted a majority of sites, then we are
guaranteed (under ubik assumptions) that we've found the best version,
since previous writes to the database should be guaranteed to hit a
majority of sites (otherwise they wouldn't be successful).
If we cannot reach a majority of sites, we just don't set
UBIK_RECFOUNDDB, and the recovery process restarts. Presumably on the
next iteration we'll be able to contact them, or we'll lose sync site
status if we can't reach the other sites for long enough.
Reviewed-on: https://gerrit.openafs.org/12281
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit d3dbdade7e8eaf6da37dd6f1f53d9f1384626071)
Change-Id: I4f4e7255efd3e16e3acfec8f90bf2019cab1fb63
Reviewed-on: https://gerrit.openafs.org/12339
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
src/ubik/recovery.c | 45 +++++++++++++++++++++++++++++----------------
1 files changed, 29 insertions(+), 16 deletions(-)
--
OpenAFS Master Repository