[AFS3-std] Re: [OpenAFS-devel] convergence of RxOSD, Extended Call Backs, Byte Range Locking, etc.

Hartmut Reuter reuter@rzg.mpg.de
Fri, 24 Jul 2009 14:40:07 +0200


Tom's idea to have a Start-of-I/O-rpc and a Stop-I/O-rpc to enforce data
consistency is great. I think it would not be very difficult to
implement this.

Caching of the information returned by GetOSDlocation could reduce
traffic on the wire, but is not really essential. So if we still do one
GetOSDlocation per I/O we can use GetOSDlocaltion as Start-of-I/O-rpc.

So for write I would propose that the fileserver has to keep the
information about Fid, offset, length, host, and time in a table or
chain and keep it there until the storeMini has happened. So also
extended callbacks for file ranges would become possible. For the write
case storeMini would function as End-of-I/O-rpc.

While the entry for write exists all incoming GetOSDlocation RPCs have
to wait. This is the same behavior as happens for FetchData or StoreData
while another Storedata has the write lock on the vnode.

Up to this point everything would work fine also with the clients out
here in our cell.

However, there could be reads still under way while a new write is
starting. It's not that probable because unfortunately reads are always
for single chunks only, but it's still possible. To protect also these
reads requires an End-of-I/O-rpc for read. A new bit in the flag used in
GetOSDlocation could indicate that the client promises to send at the
end of a read operation an appopriate rpc.

With this fleg set GetOSDlocation would also create (or find an
existing) entry in the before mentioned table or chain. A field readers
would be incremented and after the I/O is finished decremented by the
End-of-I/O-rpc. As long as there are readers write requests have to wait.

The legacy interface for non rxosd prepared clients, of course, would
have to honor this table as well. But here things are easier because
everything happens within a single rpc (FetchData or StoreData).

An open question is how the fileserver should handle missing
End-of-I/O-rpcs. Therefore the timestamp field. The FiveMinuteCheckLWP
could look for out-timed transactions....

-Hartmut
-----------------------------------------------------------------
Hartmut Reuter                  e-mail 		reuter@rzg.mpg.de
			   	phone 		 +49-89-3299-1328
			   	fax   		 +49-89-3299-1301
RZG (Rechenzentrum Garching)   	web    http://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------