OpenAFS Master Repository branch, openafs-stable-1_8_x, updated. openafs-stable-1_8_0pre4-17-g35636bd

Gerrit Code Review gerrit@openafs.org
Fri, 9 Feb 2018 09:31:09 -0500


The following commit has been merged in the openafs-stable-1_8_x branch:
commit 35636bd9e32015bc10e09ccbb34a71a1459cdc4b
Author: Marcio Barbosa <mbarbosa@sinenomine.net>
Date:   Mon Aug 21 14:21:54 2017 -0400

    ubik: avoid DISK_Begin on sites that didn't vote for sync
    
    As already described on 7c708506, SDISK_Begin fails on remotes if
    lastYesState is not set. To fix this problem, 7c708506 does not allow
    write transactions until we know that lastYesState is set on at least
    quorum (ubik_syncSiteAdvertised == 1). In other words, if enough sites
    received a beacon packet informing that a sync-site was elected, write
    transactions will be allowed. This means that ubik_syncSiteAdvertised
    can be true while lastYesState is not set in a few sites.
    
    Consider the following scenario in a cell with frequent write
    transactions:
    
    Site A => Sync-site (up)
    Site B => Remote 1 (up)
    Site C => Remote 2 (down - unreachable)
    
    Since A and B are up, we have quorum. After the second wave of beacons,
    ubik_syncSiteAdvertised will be true and write transactions will be
    allowed. At some point, C is not unreachable anymore. Site A sends a
    copy of its database to C, but C did not vote for A yet (lastYesState ==
    0). A new write transaction is initialized and, since lastYesState is
    not set on C, DISK_Begin fails on this remote site and C is marked as
    down. Since C is reachable, A will mark this remote site as up. The
    sync-site will send its database to C, but C did not vote for A yet. A
    new write transaction is initialized and, since lastYesState is not set
    on C, DISK_Begin fails on this remote site and C is marked as down. In a
    cell with frequent write transactions, this cycle will repeat forever.
    As a result, the sync-site will be constantly sending its database to C
    and quorum will be operating with less sites, increasing the chances
    of re-elections.
    
    To fix this problem, do not call DISK_Begin on remotes that did not
    vote for the sync-site yet.
    
    Reviewed-on: https://gerrit.openafs.org/12715
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit 68ec78950a6e39dc1bf15012d4b889728086d0b7)
    
    Change-Id: I3764c23125f0bc675762449cd29b282ba403f871
    Reviewed-on: https://gerrit.openafs.org/12896
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

 src/ubik/ubik.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

-- 
OpenAFS Master Repository