OpenAFS Master Repository branch, openafs-stable-1_6_x, updated. openafs-stable-1_6_22_2-38-g05ca762
Gerrit Code Review
gerrit@openafs.org
Fri, 4 May 2018 04:40:15 -0400
The following commit has been merged in the openafs-stable-1_6_x branch:
commit 05ca7626da0f212e8f02705c45440dd2d8efadc0
Author: Marcio Barbosa <mbarbosa@sinenomine.net>
Date: Thu Feb 22 17:53:23 2018 -0500
ubik: don't set database epoch to 0 if not needed
If our attempt to receive a fresh database from a peer fails, we will
overwrite the version.epoch field of our current local copy of the
database with an invalid value, "0". The idea behind this approach is
to make sure that this database will not be seen as a legit copy if the
transfer is not completed properly. Although it is questionable if this
approach is still necessary (since the current version writes the data
into a temporary file), it is undisputed that the database version does
not have to be invalidated if the transfer fails in a early stage where
no data has been written and we could safely continue to reuse the local
copy for read-only queries. Early failures may happen if:
1. The peer sending the database to us is not the peer we believe to be
the sync site;
2. The sender is not authorized to call DISK_SendFile;
In both cases, the database epoch is invalidated. As a result of that,
we may have the following consequences:
1. Reads may not be allowed
Once the on disk epoch is invalidated, if the server in question is
rebooted, the invalid on disk epoch will be used to initialize the in
memory epoch. At this point, reads may not be allowed since
urecovery_AllBetter checks if the in memory epoch is greater than 1.
Reads should not be blocked forever since the sync-site will send a new
database to this remote and, as a result of that, the invalid version
will be corrected.
2. Data can be lost
If the site with the invalid epoch is the one with the most recent
database, the database can be rolled back to an earlier version during a
new quorum establishment. Consider the following scenario where we have
three sites:
Site A (up - database up to date) (sync-site)
Site B (up - database up to date)
Site C (down - old database)
The epoch of B is invalidated due to the problem fixed by this patch.
Then, A is turned off and C is turned on. In this scenario, the new
sync-site will distribute the old database held by C since its epoch is
greater than 0.
To fix the problem in question, do not set the database epoch to 0
if the local database was not modified.
Acknowledgements:
Hartmut Reuter <hartmut.reuter@gmx.de>
- found the problem;
- suggested a possible solution;
Benjamin Kaduk <kaduk@mit.edu>
- submitted the first version;
Andrew Deason <adeason@sinenomine.net>
- suggested changes;
Reviewed-on: https://gerrit.openafs.org/12924
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
(cherry picked from commit bd6a2484011dad6298c4ce97dd0cd68e0834baa5)
Reviewed-on: https://gerrit.openafs.org/12937
Reviewed-by: Hartmut Reuter <reuter@rzg.mpg.de>
Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit b41065f2b8877580a7e1858b8e2857973ddf6503)
Change-Id: I0923dddd2bf32f97230f3addb2fc376c0b2fa85c
Reviewed-on: https://gerrit.openafs.org/13027
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
src/ubik/remote.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)
--
OpenAFS Master Repository