[OpenAFS-devel] afs and byte-range locking ideas

Ted Anderson ota@transarc.com
Mon, 10 Dec 2001 11:25:45 -0500 (EST)


Sorry to come in on this late.

On Wed, 5 Dec 2001 14:03:48 -0500 (EST) Jeffrey Hutzelman <jhutz@cmu.edu> wrote:
> The problem here is that client 2 can't send an update unless its copy
> of the data is valid.  This is required for consistency.  It _might_
> work; I'd have to read the code, and I don't have time at the moment
> to reabsorb the cache manager code to the point where I could say for
> certain.  But even if it does work, it will be slow -- if clients 1
> and 2 are both using the file, then each time one of them writes, the
> other will have to refetch the entire file.
>
> This is made less of a problem by the normal semantics of not flushing
> any writes to the fileserver until the file is closed.  But that plays
> havoc with your byte-range locking, since you can't safely release a
> lock until the relevant data has been written.  More precisely, you
> can't safely release a lock until _all_ the data written to the file
> while the lock was held has been sent back to the fileserver, even if
> the data wasn't in the region covered by the lock -- many applications
> have complex locking models where a lock on one data structure also
> covers related structures.

As I understand it, stage one is to provide synchronous byte range lock
management, somewhat akin to NFS NLM.  These thoughts assume stage one
and apply to stage two.

I think the following approach would work.  The client would need to
carefully track local modifications made to files with byte range locks.
When sending data to the server, it would have to limit the scope of the
write to affect only those bytes covered by the write lock.  The file
server supports this because AFS_StoreData takes a position and length
and doesn't appear to make any assumptions about cache chunk sizes or
boundaries.  When a write-locked range is unlocked the client must first
send the modified and locked region back to the server.  This will
trigger a callback to any other client that is using the file.  Since
lock operations, unlike AFS callbacks, are synchronous, when another
client locks the file he will see the latest data.  Locks would be
relinquished before closing a file so the writes should happen at about
the same time.

Certainly, active write sharing between multiple clients might lead to a
lot of callbacks and data transfers.  I think a client holding a write
lock can safely ignore a callback on the locked bytes, but of course it
must make sure the bytes are current when the lock is obtained.  This
should help performance a bit.  If a client detects severe swapping it
could go into an uncached mode and send all reads and writes directly to
the server, which is what CIFS does.  In absence of sharing forcing a
synchronous write before every unlock could still be a performance
problem.  Avoiding this cost would require the ability to delegate
locking to the client.  It should be possible to extend callbacks to
revoke delegated locks.  This would be more complicated that the
simplest implementation of stage one, but perhaps not too hard.

Unlike DFS, which uses different tokens for data caching and byte range
locking (DFS data sharing would work even if application synchronization
were accomplished by some "out of band" mechanism such as smoke
signals), this scheme tightly links the concept of application locks to
the client's caching algorithm.  This is the same approach being taking
to cache coherence in NFSv4.

Writes outside a locked range could be handled in various ways, but are
technically undefined from the point of view of correct data sharing.
The applications that Jeffrey alludes to, that lock one region but mean
to "protect" other regions, fall into the non-conforming category of
using "smoke signals" to achieve synchronization.  They would need DFS.
However, I don't believe Microsoft applications generally do this type
of locking.  I would be interested in hearing about important
counterexamples.

I'm not suggesting that these changes would be trivial or even
straightforward, but it does seem plausible without adding the DFS token
model to AFS.  It is certainly possible, however, that I am overlooking
something crucial.  Even the NFSv4 effort has not progressed very far
down this path, so they may yet find problems difficult to solve
efficiently without DFS-like token functionality.

Ted Anderson