[OpenAFS-devel] ubik write-locks, commits, reads

Derrick Brashear shadow@gmail.com
Mon, 12 Apr 2010 18:47:21 -0400


On Mon, Apr 12, 2010 at 4:31 PM, Andrew Deason <adeason@sinenomine.net> wrote:
> Hello,
>
> Currently, if a ubik site becomes unavailable during a write
> transaction, all reads are blocked for quite some time. (Need to wait
> for timeouts, etc.) This can grind a cell to a halt (temporarily) with
> the loss of a single ubik site.
>
> To alleviate this, we've been thinking about allowing reads to occur on
> a ubik database even when there's a conflicting write lock. When the db
> is write-locked, we know that the data is consistent, since no actual
> changes occur until the commit. So, a site would allow reads until a
> write transaction was committed, and reads would be blocked while we
> commit the changes to the local db.
>
> The problem with doing this is that it allows different data to be
> visible from the ubik db at the same time to clients (some sites could
> have committed data, while others are waiting for a commit and are
> serving old data). Is that horrible? Will that break everything?

it already happens for any transaction in ReadAny mode, which was a
concession to an out-of-quorum replica being allowed to continue
serving. no loud screaming has occurred in the nearly 20 years it's
been true. so i'd say it's not a big deal.

> If that is out of the question, another idea I had was to return some
> error code to clients if the current site notices that the db is
> write-locked from another site (before a commit arrives). This would be
> something such that clients would retry other sites, until they get one
> with fresh information.

VBUSY for other services.

>
> That approach would prevent sites from serving old information, and
> would still allow db information to be available (clients should at
> least eventually hit the sync site, which would always have fresh info).
> This has the downside of additional load on the dbservers, though.
> Possibly the sync site in particular, when another site fails. At least
> with pthreaded ubik, we could also do something like "return an error
> after time X if the write hasn't been committed/aborted yet" as a
> load/responsiveness tradeoff.

I'd do the former. I can dig up a reference if it'd help.