OpenAFS Master Repository branch, openafs-stable-1_6_x, updated. openafs-stable-1_6_22_2-13-g37b70b9

Gerrit Code Review gerrit@openafs.org
Fri, 23 Feb 2018 07:23:39 -0500


The following commit has been merged in the openafs-stable-1_6_x branch:
commit 37b70b9c62b2799f7095fa83ab84485eb991cf39
Author: Marcio Barbosa <mbarbosa@sinenomine.net>
Date:   Mon Dec 11 19:18:43 2017 -0300

    ubik: update epoch as soon as sync-site is elected
    
    The ubik_epochTime represents the time at which the coordinator first
    received its coordinator mandate. However, this global is currently not
    updated at the moment when a new sync-site is elected. Instead,
    ubik_epochTime is only updated at the very end of the first write
    transaction, when a new database label is written (in udisk_commit).
    This causes at least 2 different issues:
    
    For one, this means that we change ubik_epochTime while a remote
    transaction is in progress. If VOTE_Beacon is called after
    ubik_epochTime is updated, but before the remote transaction ends, the
    remote sites will detect that the transaction id in ubik_currentTrans is
    wrong (via urecovery_CheckTid(), since the epoch doesn't match), and
    they will abort the transaction. This means the transaction will fail,
    and it may cause a loss of quorum until another election is completed.
    
    Another issue is that ubik_epochTime can be 0 at the beginning of a
    write transaction, if this is the first election that this site has won.
    Since ubik_epochTime is used to construct transaction ids, this means
    that we can have different transactions that originate from different
    sites at different times, but they have the same epoch in their tid.
    For example, say a write transaction starts with epoch 0, but the
    originating site is killed/interrupted before finishing. That write
    transaction will linger on remote sites in ubik_currentTrans with an
    epoch of 0 (since the originating site will never call
    DISK_ReleaseLocks, or DISK_Abort, etc). Normally the sync site will kill
    such a lingering transaction via urecovery_CheckTid, but since the epoch
    is 0, and the election winner's epoch is also 0, the transaction looks
    valid and may never be killed. If that transaction is holding a lock on
    the database, this means that the database will forever remain locked,
    effectively preventing any access to the db on that site.
    
    To fix both of these issues, update ubik_epochTime with the current
    time as soon as we win the election. This ensures that the epoch is not
    updated in the middle of a transaction, and it ensures that all
    transactions are created with a unique epoch: the epoch of the election
    that we won.
    
    Note that with this commit, we do not ever set ubik_epochTime to the
    magic value of '2' during database init. The special '2' epoch only
    needs to be set in the database itself, and it is never an actual epoch
    that represents a real quorum that went through the election process.
    The database will be labelled with a 'real' epoch after the first write,
    like normal.
    
    [kaduk@mit.edu: comment the locking strategy in ubeacon_Interact()]
    
    Reviewed-on: https://gerrit.openafs.org/12609
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net>
    Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
    (cherry picked from commit da704137f4bf766250ca87dbdc5a85c2024cb0a6)
    
    Change-Id: I82e9ec41eb1a2316ecd2b76ef5c89432b2a3c059
    Reviewed-on: https://gerrit.openafs.org/12806
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>
    Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
    Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
    Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net>
    Reviewed-by: Hartmut Reuter <reuter@rzg.mpg.de>
    Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

 src/ubik/beacon.c   |   16 +++++++++++++---
 src/ubik/disk.c     |    3 +--
 src/ubik/recovery.c |    3 +--
 3 files changed, 15 insertions(+), 7 deletions(-)

-- 
OpenAFS Master Repository