[OpenAFS-devel] Re: The ubik transaction ID rollover problem

Jeffrey Hutzelman jhutz@cmu.edu
Fri, 03 Sep 2010 16:32:56 -0400


--On Friday, September 03, 2010 01:37:43 PM -0500 Andrew Deason 
<adeason@sinenomine.net> wrote:

> On Mon, 30 Aug 2010 13:12:41 -0500
> Andrew Deason <adeason@sinenomine.net> wrote:
>
>> So, based on all of this, I propose to do three things:
>>
>>  1. Set writeTidCounter = tidCounter on beginning write transactions
>>     instead of incrementing writeTidCounter by 2.
>
> After much discussion and such on jabber... this is gerrit 2647, if
> people want to look.

After the Jabber discussion, I had an offline discussion with another 
developer who, like myself, has spent more time studying this code than 
most.  I believe we came to the conclusion that setting writeTidCounter as 
you describe has always been the right thing to do, and that this change is 
neither harmful nor backward-incompatible, given the current behavior of 
urecovery_CheckTid.

The rest of my comments here are colored by that offline discussion, but 
are mostly my opinion and interpretation and may not reflect what the other 
developer thinks about the topic, or may even be wrong. :-)

>>  2. Change urecovery_CheckTid to strictly check if the given
>>     transaction id is _equal_ to the current transaction id, not just
>>     if it is equal or happened before the current transaction.
>
> While this may be worthwhile to do everywhere except SVOTE_Beacon, I
> think this requires more thinking about.
>
> It _might_ be this way to avoid the case where a beacon is sent, and
> then a write transaction is created, and the beacon might have an older
> (smaller) transaction id, causing the transaction to abort. As far as I
> can tell that would only need to be a special case for SVOTE_Beacon
> processing, but... it'll take more thought.

There are actually three cases here:

1) SVOTE_Beacon
   This is the most difficult case, and will require further study.
   The current test may be subject to a race that can cause it to
   abort a transaction when it shouldn't, but I'm not positive.  It
   may also be necessary to prevent corruption in certain cases when
   there is poor connectivity, or it may not; again, I'm not sure.
   I do believe a strict check would be worse, as it could result
   in a different race, in whose existance I am more confident.

2) SDISK_Begin
   The comparison should not be done at all here; SDISK_Begin should
   _always_ abort any in-progress transaction.  In fact, this was
   already the case, because SDISK_Begin would abort any active
   transaction if urecovery_CheckTid had not killed it.

   There was some duplication of code here which was eliminated in
   #2628 (already merged), though in retrospect I think that code
   might be better if abortalways were checked _first_, rather than
   after checking the epoch and counter.

3) Everywhere else (that is, other SDISK_*)
   These should perform an exact check; it is never OK for a DISK
   operation intended for one transaction to be applied to another.
   This should not be a problem, as ubik should not make these RPC's
   to a server on which the DISK_Begin() call did not succeed.
   Still, it would probably be good to fix this.

>>  3. To accomodate ubik sites without the change in "1.", change
>>     SVOTE_Beacon to only check the epoch of the given transaction id,
>>     and ignore the tid counter. Otherwise, we will abort valid
>>     transactions due to other sites giving us a bogus transaction id
>>     in VOTE_Beacon messages. This relaxes the supposed safety check
>>     somewhat, but if other sites are sending bogus trans id counters,
>>     we can't really check them.
>
> After discussing and thinking about this, this may not be a good idea.
> The problem is with the sender of the VOTE_Beacon messages, so... we
> should just fix the VOTE_Beacon side, and not tighten the VOTE_Beacon
> trans id check.

What you had proposed didn't tighten the check; it relaxed it.  I'm fairly 
sure that if the check is needed at all, then checking only the epoch is 
not sufficient.  I'm less sure at the moment about whether the check is 
needed.  However, my original conservatism holds - until we're sure that it 
not only isn't needed but is actively harmful, we shouldn't remove it.


Finally, I'm concerned about the way ubeacon_Interact() decides what to 
send as the transaction ID.  Currently it sends writeTidCounter if the 
DBWRITING flag is set, and tidCounter + 1 otherwise.  The problem here is 
that I believe there may be a race which could case a value to be sent 
which is "right" but doesn't match the remote server's idea of the correct 
transaction ID because of a race between ubeacon_Interact and when the 
DISK_Begin or DISK_{Commit,ReleaseLocks,Abort} happens on that particular 
server.  I'm somewhat surprised we're not already seeing a problem, so it 
may be that the race doesn't actually exist; I'll have to work through how 
the locking works to decide.

-- Jeff