[AFS3-std] Re: [OpenAFS-devel] convergence of RxOSD, Extended Call Backs, Byte Range Locking, etc.

Matt W. Benjamin matt@linuxbox.com
Thu, 23 Jul 2009 17:18:09 -0400 (EDT)


Hi Jeff,

Thanks for the clarifications.

----- "Jeffrey Hutzelman" <jhutz@cmu.edu> wrote:

> --On Thursday, July 23, 2009 01:57:42 PM -0400 "Matt W. Benjamin" 
> <matt@linuxbox.com> wrote:
> 
> > Hi Jeff,
> >
> > 1. The DV concept carries the assertion that any representation of
> a
> > (range of a) file at a DV is equivalent to any other
> representation.
> 
> I'm not sure what you mean here.  A FID+DV names a string of bits,
> period.


It's in no way in dispute that, on the wire, "A FID+DV names a string of bits."  In order to analyze cache coherence, though (in a system which considers caching), it is necessary to describe and reason about the views of a file clients have cached.  In a cache, necessarily, it is not sufficient to consider the coherence only protocol messages--what we're reasoning about, in addition, is a distributed system with state.  

Moreover, I think it's clear that in a cache, we're concerned about not only the coherence of the distributed images of the cache, but, also, the relationship of data that is cached with computations which may be in progress on cached instances of the data.  It is in this sense that I used the term "useful" (or not useful).  As Tom pointed out in side conversation, this is not a gray area in computing science.   "Basically [what] you're asserting is the classical SMP mutual exclusion problem -- just having cache coherence isn't enough to guarantee deterministic outcome of a parallel application [sic, i.e., computation] without the use of synchronization primitives" (tkeiser).

> 
> > We've discussed related concepts, in the context of async delivery
> of
> > extended callbacks, a number of times before.  I think that it is
> > relevant to both discussions that, even two clients simultaneously
> > mutating a file (one or both has not yet stored), states of the
> > distributed system (set of all views of the file) that violates the
> > assertion.
> 
> Not as seen at the protocol layer.  Anyone who fetches data for a
> given 
> range and gets the same DV, also gets the same data.

That is certainly not in dispute either.  Framing the issue this way points up the fact that apparently in rxOSD currently this assertion could be violated even with a single writer, as we discussed earlier (but I think I muddied the waters at least for myself by thinking of concurrent writers).

(Nor is this assertion violated by any behavior in extended callbacks (i.e., async delivery), so in that sense, maybe I'm connecting this discussion too much with prior ones.)

> 
> You state that clients may have local mutations which have not been
> written 
> to the fileserver and which they label with a DV that may mean
> something 
> else to another client, or even to the fileserver.  This may be the
> case, 
> but it is an implementation matter, and on the wire, that DV can only
> ever 
> mean one thing, which is the meaning assigned to it be the
> fileserver.

As stated, I do not believe that it's actually viable to restrict discussion to what is on the wire, but what you say is certainly correct, in that sense.  It would be, in fact, a completely satisfactory analysis (I think), if caching were not considered.  Of course, for the by far most common use (at the moment), in AFS, caching is taking place, and intended to take place, and so necessarily we can't disregard it.

> 
> >  I think it is critical to think through the implications of
> > this.  I think that asserting that single store operations be
> synchronous
> > across the distributed views if the caches do not take reservations,
> as I
> > believe they do in DFS, is not a useful consistency guarantee.  And,
> I
> > think it's the case that in the common case for DFS, the reservation
> is
> > probably useless, because it's not coordinated with the
> applications
> > doing the I/Os.
> 
> I'm not sure why you keep talking about DFS.  We're not talking about
> DFS; 
> we're talking about AFS. 

I keep talking about DFS because, what -I- think we're doing in is designing for a future in which the AFS protocol delivers a set of semantics that may be (depending on such factors as client preference, its own configuration, and the resource being published) stronger or, perhaps, weaker than those of traditional AFS.  I see DFS as an important model for reasoning about what (a specific set of, known-useful, stronger than those of AFS) alternate semantics might look like, though, as I have stated, not a uniformly desirable model.

> In AFS, it must not be the case that if we
> both 
> start with DV n and I start writing, that you can do a read partway
> through 
> my write and get something you think is DV n+1, and then my write
> completes 
> and the result is also DV n+1.  It also must not be the case that if
> you 
> start a read of DV n before my write starts, you get (or already have)
> some 
> data which is part of DV n, and then get some of the data that I wrote
> and 
> think it is also part of DV n.

There is no dispute here--unless it's over what rxOSD is allowed to do.  This is another way of stating what was stated earlier (including by me, in my original response to Hartmut, "Issue 2").

> 
> > 2. I do not follow your distinction between data and metadata, with
> > respect to what clients now do and what xcb clients are specified to
> do
> > on receipt of a StoreData extended callback notification (data
> changed in
> > a range).  Could you please clarify?
> 
> A fileserver can tell a client that something about FID 1.2.3 has
> changed, 
> and the client has to do a new FetchStatus to find out that the DV is
> no 
> longer 5 and instead has changed to 6.  With XCB, the fileserver can
> even 
> tell the client in the callback that the DV has changed to 6, and it
> can 
> potentially even give the cache manager information about which ranges
> are 
> different between DV 5 and 6.  What it cannot do is tell the cache
> manager 
> that the first 512 bytes of FID 1.2.3 DV 5 have changed and are now 
> something else.

Ok, sure.  But I believe what you are describing is not invalidation, but rather replacement.  It's not incorrect to use "invalidate" referring to "data."  This precisely means (some specialization of, such as a message indicating) the data as known is not valid, not that replacement data is delivered.  As you state, XCB has operations that replace metadata, but (as with the traditional AFS callback) only invalidate data.  

(Not that I can't think of a use for data replacement, in future.)

> 
> -- Jeff

-- 

Matt Benjamin

The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

http://linuxbox.com

tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309