[OpenAFS-devel] Re: The ubik transaction ID rollover problem

Fri, 3 Sep 2010 16:00:00 -0500

On Fri, 03 Sep 2010 16:32:56 -0400
Jeffrey Hutzelman <jhutz@cmu.edu> wrote:

> There are actually three cases here:
> 
> 1) SVOTE_Beacon
>    This is the most difficult case, and will require further study.
>    The current test may be subject to a race that can cause it to
>    abort a transaction when it shouldn't, but I'm not positive.  It
>    may also be necessary to prevent corruption in certain cases when
>    there is poor connectivity, or it may not; again, I'm not sure.
>    I do believe a strict check would be worse, as it could result
>    in a different race, in whose existance I am more confident.

Yes, agreed.

> > After discussing and thinking about this, this may not be a good idea.
> > The problem is with the sender of the VOTE_Beacon messages, so... we
> > should just fix the VOTE_Beacon side, and not tighten the VOTE_Beacon
> > trans id check.
> 
> What you had proposed didn't tighten the check; it relaxed it.  I'm
> fairly sure that if the check is needed at all, then checking only the
> epoch is not sufficient.  I'm less sure at the moment about whether
> the check is needed.  However, my original conservatism holds - until
> we're sure that it not only isn't needed but is actively harmful, we
> shouldn't remove it.

Yes, perhaps I could have said that more clearly as "we should just fix
the VOTE_Beacon side, and leave the SVOTE_Beacon trans id check alone".

> Finally, I'm concerned about the way ubeacon_Interact() decides what
> to send as the transaction ID.  Currently it sends writeTidCounter if
> the DBWRITING flag is set, and tidCounter + 1 otherwise.  The problem
> here is that I believe there may be a race which could case a value to
> be sent which is "right" but doesn't match the remote server's idea of
> the correct transaction ID because of a race between ubeacon_Interact
> and when the DISK_Begin or DISK_{Commit,ReleaseLocks,Abort} happens on
> that particular server.  I'm somewhat surprised we're not already
> seeing a problem, so it may be that the race doesn't actually exist;
> I'll have to work through how the locking works to decide.

I think we don't currently see this problem because if DBWRITING is set,
we send a trans id counter that cannot be "wrong". Since we base it off
of the writeTidCounter, which is always a very low positive number, it
will always be below any active write transaction, and
urecovery_CheckTid will not mark it as "wrong".

If DBWRITING is not set, we send tidCounter+1, as you mention. If there
is still no write transaction when it arrives, the trans id is not
checked. If a write transaction has started in the meantime, it will
have a higher transaction id than the one sent since it began after we
sent the beacon. (Otherwise the sync site would have detected DBWRITING
and would have sent writeTidCounter). So, again, urecovery_CheckTid will
not mark the trans as "wrong".

This last paragraph is what makes me wonder if that's specifically the
race that the inequality in urecovery_CheckTid is for (that is, why it's
">" not "!=")

-- 
Andrew Deason
adeason@sinenomine.net