[AFS3-std] Re: [OpenAFS-devel] convergence of RxOSD, Extended Call Backs, Byte Range Locking, etc.

Thu, 23 Jul 2009 18:06:49 -0400

--On Thursday, July 23, 2009 05:18:09 PM -0400 "Matt W. Benjamin" 
<matt@linuxbox.com> wrote:

> It's in no way in dispute that, on the wire, "A FID+DV names a string of
> bits."  In order to analyze cache coherence, though (in a system which
> considers caching), it is necessary to describe and reason about the
> views of a file clients have cached.  In a cache, necessarily, it is not
> sufficient to consider the coherence only protocol messages--what we're
> reasoning about, in addition, is a distributed system with state.

Of course it is.  But it's important to distinguish between the state of 
the abstract distributed system, which is as much part of the protocol as 
the format of RPC messages are, and what is going on inside any particular 
implementation.

As Tom points out, the current protocol makes it possible to have a client 
implementation which is fully coherent, and even to have distributed 
applications which depend on this coherency, provided all clients are 
playing along (for example, you need to be rather more careful with 
coordinating locking and cache coherency than I think the current OpenAFS 
client is).

"AFS doesn't support strong consistency" is a very different statement from 
"the current AFS client doesn't implement stronc consistency".  This is why 
I am particular concerned with proposals to do away with things like the 
guarantee that _before a StoreData completes_, and particularly, before any 
otehr RPC's on that vnode can run, every client with a relevant callback 
either has been notified or has been marked down, such that it will be told 
to discard any pending state before being allowed to do anything.  You seem 
to believe this is unimportant because you believe that AFS doesn't support 
strong consistency, whereas I believe it _is_ important because AFS _does_ 
support strong consistency; the current client just falls short in a few 
places.

> "Basically
> [what] you're asserting is the classical SMP mutual exclusion problem --
> just having cache coherence isn't enough to guarantee deterministic
> outcome of a parallel application [sic, i.e., computation] without the
> use of synchronization primitives" (tkeiser).

No, of course it's not.  But we _have_ synchronization primitives, and it 
is possible for a set of cooperating AFS clients, using the current 
protocol, to correctly execute a parallel computation with shared data in 
AFS.  It may or may not be possible or efficient for a set of applications 
running on distinct hosts running the OpenAFS client to do so.

In my experience working on a number of single-client-single-server and 
distributed protocols, I have found that there is much value in considering 
a protocol in terms of its defined semantics, rather than only in terms of 
the current behavior of one or more particular implementations.  My 
experience with AFS, going back well before the initial OpenAFS code drop, 
has shown that this holds even when there is only one implementation.

>>
>> > We've discussed related concepts, in the context of async delivery
>> of
>> > extended callbacks, a number of times before.  I think that it is
>> > relevant to both discussions that, even two clients simultaneously
>> > mutating a file (one or both has not yet stored), states of the
>> > distributed system (set of all views of the file) that violates the
>> > assertion.
>>
>> Not as seen at the protocol layer.  Anyone who fetches data for a
>> given
>> range and gets the same DV, also gets the same data.
>
> That is certainly not in dispute either.  Framing the issue this way
> points up the fact that apparently in rxOSD currently this assertion
> could be violated even with a single writer, as we discussed earlier (but
> I think I muddied the waters at least for myself by thinking of
> concurrent writers).
>
> (Nor is this assertion violated by any behavior in extended callbacks
> (i.e., async delivery), so in that sense, maybe I'm connecting this
> discussion too much with prior ones.)

Yeah, maybe.  We started out discussing problems with the way RxOSD affects 
coherency, but between you and I, we seem to have wandered back into the 
async delivery argument.

>> You state that clients may have local mutations which have not been
>> written
>> to the fileserver and which they label with a DV that may mean
>> something
>> else to another client, or even to the fileserver.  This may be the
>> case,
>> but it is an implementation matter, and on the wire, that DV can only
>> ever
>> mean one thing, which is the meaning assigned to it be the
>> fileserver.
>
> As stated, I do not believe that it's actually viable to restrict
> discussion to what is on the wire, but what you say is certainly correct,
> in that sense.  It would be, in fact, a completely satisfactory analysis
> (I think), if caching were not considered.  Of course, for the by far
> most common use (at the moment), in AFS, caching is taking place, and
> intended to take place, and so necessarily we can't disregard it.

Caching doesn't affect this.  A DV still only means one thing, which is the 
meaning the fileserver has assigned to it.  If a client has in its cache 
data labelled with a particular DV, then one of the following must be true:

- That data is exactly what the fileserver would have returned for that DV.
- The data is mislabelled, and the client is buggy.
- The data is actually _not_ labelled with that DV; you only think it is.

In the OpenAFS client, the last happens fairly regularly, because dirty 
chunks may have a "data version" field with a particular number in them, 
but the cache manager is never confused into thinking that such chunks 
represent the contents of the DV corresponding to that number.  It always 
knows the difference between _cached_ data, which is either obtained for 
the fileserver or labelled with the DV resulting from a store, and _dirty_ 
data, which does not correspond to any version known to the fileserver.

> Ok, sure.  But I believe what you are describing is not invalidation, but
> rather replacement.  It's not incorrect to use "invalidate" referring to
> "data."  This precisely means (some specialization of, such as a message
> indicating) the data as known is not valid, not that replacement data is
> delivered.  As you state, XCB has operations that replace metadata, but
> (as with the traditional AFS callback) only invalidate data.

We may be arguing semantics here, but, the point is that the fileserver can 
never say "the first 512 bytes of DV 5 are invalid; get new ones".  It can 
only say "DV 5 is no longer the current version".  In this regard, what XCB 
does is not a change in semantics, but a way to tell a client how it can 
obtain DV 6 primarily by copying parts of the data it already has for DV 5, 
rather than by fetching everything from scratch.  In fact, this _depends_ 
on the property that the meaning of DV 5 does not change _even after it is 
no longer the current version_, which makes XCB another reason why it is 
important that RxOSD not violate the "one DV, one bit string" rule.

-- Jeff