[AFS3-std] Re: [OpenAFS-devel] convergence of RxOSD, Extended
Call Backs, Byte Range Locking, etc.
Jeffrey Hutzelman
jhutz@cmu.edu
Thu, 23 Jul 2009 18:06:49 -0400
--On Thursday, July 23, 2009 05:18:09 PM -0400 "Matt W. Benjamin"
<matt@linuxbox.com> wrote:
> It's in no way in dispute that, on the wire, "A FID+DV names a string of
> bits." In order to analyze cache coherence, though (in a system which
> considers caching), it is necessary to describe and reason about the
> views of a file clients have cached. In a cache, necessarily, it is not
> sufficient to consider the coherence only protocol messages--what we're
> reasoning about, in addition, is a distributed system with state.
Of course it is. But it's important to distinguish between the state of
the abstract distributed system, which is as much part of the protocol as
the format of RPC messages are, and what is going on inside any particular
implementation.
As Tom points out, the current protocol makes it possible to have a client
implementation which is fully coherent, and even to have distributed
applications which depend on this coherency, provided all clients are
playing along (for example, you need to be rather more careful with
coordinating locking and cache coherency than I think the current OpenAFS
client is).
"AFS doesn't support strong consistency" is a very different statement from
"the current AFS client doesn't implement stronc consistency". This is why
I am particular concerned with proposals to do away with things like the
guarantee that _before a StoreData completes_, and particularly, before any
otehr RPC's on that vnode can run, every client with a relevant callback
either has been notified or has been marked down, such that it will be told
to discard any pending state before being allowed to do anything. You seem
to believe this is unimportant because you believe that AFS doesn't support
strong consistency, whereas I believe it _is_ important because AFS _does_
support strong consistency; the current client just falls short in a few
places.
> "Basically
> [what] you're asserting is the classical SMP mutual exclusion problem --
> just having cache coherence isn't enough to guarantee deterministic
> outcome of a parallel application [sic, i.e., computation] without the
> use of synchronization primitives" (tkeiser).
No, of course it's not. But we _have_ synchronization primitives, and it
is possible for a set of cooperating AFS clients, using the current
protocol, to correctly execute a parallel computation with shared data in
AFS. It may or may not be possible or efficient for a set of applications
running on distinct hosts running the OpenAFS client to do so.
In my experience working on a number of single-client-single-server and
distributed protocols, I have found that there is much value in considering
a protocol in terms of its defined semantics, rather than only in terms of
the current behavior of one or more particular implementations. My
experience with AFS, going back well before the initial OpenAFS code drop,
has shown that this holds even when there is only one implementation.
>>
>> > We've discussed related concepts, in the context of async delivery
>> of
>> > extended callbacks, a number of times before. I think that it is
>> > relevant to both discussions that, even two clients simultaneously
>> > mutating a file (one or both has not yet stored), states of the
>> > distributed system (set of all views of the file) that violates the
>> > assertion.
>>
>> Not as seen at the protocol layer. Anyone who fetches data for a
>> given
>> range and gets the same DV, also gets the same data.
>
> That is certainly not in dispute either. Framing the issue this way
> points up the fact that apparently in rxOSD currently this assertion
> could be violated even with a single writer, as we discussed earlier (but
> I think I muddied the waters at least for myself by thinking of
> concurrent writers).
>
> (Nor is this assertion violated by any behavior in extended callbacks
> (i.e., async delivery), so in that sense, maybe I'm connecting this
> discussion too much with prior ones.)
Yeah, maybe. We started out discussing problems with the way RxOSD affects
coherency, but between you and I, we seem to have wandered back into the
async delivery argument.
>> You state that clients may have local mutations which have not been
>> written
>> to the fileserver and which they label with a DV that may mean
>> something
>> else to another client, or even to the fileserver. This may be the
>> case,
>> but it is an implementation matter, and on the wire, that DV can only
>> ever
>> mean one thing, which is the meaning assigned to it be the
>> fileserver.
>
> As stated, I do not believe that it's actually viable to restrict
> discussion to what is on the wire, but what you say is certainly correct,
> in that sense. It would be, in fact, a completely satisfactory analysis
> (I think), if caching were not considered. Of course, for the by far
> most common use (at the moment), in AFS, caching is taking place, and
> intended to take place, and so necessarily we can't disregard it.
Caching doesn't affect this. A DV still only means one thing, which is the
meaning the fileserver has assigned to it. If a client has in its cache
data labelled with a particular DV, then one of the following must be true:
- That data is exactly what the fileserver would have returned for that DV.
- The data is mislabelled, and the client is buggy.
- The data is actually _not_ labelled with that DV; you only think it is.
In the OpenAFS client, the last happens fairly regularly, because dirty
chunks may have a "data version" field with a particular number in them,
but the cache manager is never confused into thinking that such chunks
represent the contents of the DV corresponding to that number. It always
knows the difference between _cached_ data, which is either obtained for
the fileserver or labelled with the DV resulting from a store, and _dirty_
data, which does not correspond to any version known to the fileserver.
> Ok, sure. But I believe what you are describing is not invalidation, but
> rather replacement. It's not incorrect to use "invalidate" referring to
> "data." This precisely means (some specialization of, such as a message
> indicating) the data as known is not valid, not that replacement data is
> delivered. As you state, XCB has operations that replace metadata, but
> (as with the traditional AFS callback) only invalidate data.
We may be arguing semantics here, but, the point is that the fileserver can
never say "the first 512 bytes of DV 5 are invalid; get new ones". It can
only say "DV 5 is no longer the current version". In this regard, what XCB
does is not a change in semantics, but a way to tell a client how it can
obtain DV 6 primarily by copying parts of the data it already has for DV 5,
rather than by fetching everything from scratch. In fact, this _depends_
on the property that the meaning of DV 5 does not change _even after it is
no longer the current version_, which makes XCB another reason why it is
important that RxOSD not violate the "one DV, one bit string" rule.
-- Jeff