[AFS3-std] Locking, ACL's, and Capabilities

Wed Jul 19 20:00:04 EDT 2006

Back in March, I had a fairly lengthy zephyr discussion with Jeff Altman,
Derrick, and Chaskiel about how locking works and should work, what
behavior should be controlled where, and so on.  I promised to write up the
results of that discussion for this list, but somehow never quite got
around to it.  This week I reviewed that discussion and talked to Jeff
about some of the implications, and what follows is the result of that.

This is both a description of some of the issues, and a proposal for how to
move forward.  Note that none of this is set in stone; it's intended as a
starting point for discussion.  And, any of the opinions expressed here are
mine, and not necessarily shared by the others who participated in the
discussion I'm trying to summarize.  Also, bear in mind that while this
list is nominally for the purpose of defining the AFS _protocol_, a fair
bit of the discussion revolves around the behavior of the OpenAFS Windows
client.

First, it's worth noting that in previous discussions on this topic, we've
been pretty bad about conflating support for byte-range locking, support
for mandatory locking, and the question of how clients (whether Windows or
non-Windows) should behave with respect for locking.

By "byte-range locking", I mean the ability to acquire locks on only part
of a file.  AFS fileservers do not currently support byte-range locking,
but it would not be a terribly difficult feature to add (but see below).
Some clients do support byte-range locks within a single workstation and
back them with whole-file locks at the server; others simply ignore
byte-range locks, returning success from all locking operations without
actually doing anything.

In most cases, applications that actually make use of this feature also
expect that locks will serve as file-consistency barriers for the portions
of a file covered by a lock; that is, they expect that at a minimum, if you
hold a write lock for a portion of a file, write something there, and then
release the lock, that a process that acquires a read lock after your write
lock is released will read the data you wrote, rather than the data that
was there before.  AFS's chunk-based caching and delayed-writeback file
consistency semantics (OpenAFS on UNIX typically does not write data back
until the file is closed; on Windows, this happens on an explicit flush
request or after a timeout) make this property difficult to provide, and
IMHO providing cross-client byte-range locks without it would likely do
more harm than good.

By "mandatory locking", I mean a form of locking in which holding a lock
not only prevents others from acquiring a conflicting lock, but also
prevents them from performing conflicting operations.  With traditional
advisory locking, the existence of a read lock prevents others from
acquiring a write lock; with mandatory locking, it also prevents them from
writing to the file.  AFS does not currently support mandatory locking.

Now, on to the main question.  Traditionally, AFS clients have supported
whole-file locks backed by corresponding locks on the fileserver.  An
attempt to lock a file fails if there is a conflicting lock, or if the
fileserver refuses permission.  This has worked reasonably well on UNIX,
where locks are acquired fairly sparingly, normally only by applications
which are explicitly designed for situations requiring it.

However, Windows applications acquire locks regularly, and Windows itself
acquires a lock in the process of executing a file (or attempting to do
so).  Unfortunately, common practice has been to grant 'rl' access on
directories in AFS without 'k', which means any attempts to obtain locks
are refused by the fileserver.  The OpenAFS Windows client has generally
handled this by ignoring the error and returning success, making Windows
think it has acquired the lock.  This makes things work, but is somewhat
dangerous in cases in which the file in question is actually being used by
multiple clients at the same time.

In the course of our discussion, we determined that clients might want to
apply any of a number of possible policies.  I listed four:

  1. Traditional Windows behavior - never get server locks.
  2. Always get server locks to back local ones.
  3. Traditional UNIX behavior: get server locks unless the volume is RO.
  4. Recent Windows behavior - get server locks unless the volume is RO or
     the user's access rights are not better than 'rl'.

Discussion quickly revealed that people felt policy (2) was silly; since no
one can write to a file in an RO volume, there is never a need to actually
obtain locks on one.  It also seems that it's reasonable for UNIX clients
to continue to apply policy (3), as they always have.  Windows, due to its
heavy use of locking and the tendency of ACL's that grant 'rl' not to also
grant 'k', requires more complex behavior.

Ultimately, we want Windows users to be able to choose the locking policy
that works best for them (whether this policy should be settable per-user
or system-wide is open for discussion, as is the question of whether we
want to be able to set it separately for distinct cells, volumes, or
paths).  At a minimum, we want them to be able to choose from policies (1),
(3), and (4).  However, it turns out that the best policy may be a new one,
which I'll state in two ways:

  5. Apply policy (3) if ACL's are known to be sane with respect to the 'k'
     bit, and policy (4) if not.

  5. Get server locks unless the volume is RO, but ignore failures to
     obtain a lock unless the user's access rights are better than 'rl' or
     ACL's are known to be sane with respect to the 'k' bit.

Now, by "sane with respect to the 'k' bit", I mean that ACL's do grant the
'k' bit to users who are expected to be able to obtain read locks.  If an
ACL ACL grants 'rl' but not 'k', then the user should not be allowed to
obtain a lock.  The question is, how does a client determine whether any
particular ACL has this property?  Since the question is basically whether
the ACL truly reflects the intentions of the owner of that directory, it's
not something the client can determine simply by examining the ACL.

We more or less agreed that the "sane k bit" flag needs to come from the
fileserver; much of the discussion (and indeed, the basis of some of my
earlier objections) was based on how this bit should be conveyed and at
what granularity it should apply.

Discussion revealed that ideally, we'd like for the bit to be able to be
set distinctly for every directory, by the same users who can set the ACL
for that directory.  However, this is difficult to do, requiring changes to
both the protocol and the format of the large vnode index.  It also has the
issue that it's an awful lot of bits to set for sites where ACL's have
traditionally been sane.

To address those issues, we also discussed the addition of a per-volume
sane-k-bit flag.  This flag would be accessible via both the fileserver and
volserver, and if set, would apply to all directories within the volume.
And, we discussed the possibility of a per-server flag, exposed as a
fileserver capability bit; this is very similar to what Jeff Altman
originally proposed, though I think the semantics are slightly different.

My position is this:

- I believe Windows users should be able to select from among policies (1),
  (3), (4), and (5).  I have no position on the granularity of client-side
  configuration that should be permitted.

- I consider the ability to set the sane-k-bit flag on a per-directory
  basis to be highly desirable, because many sites will have large subtrees
  with k-insane ACL's which have not changed in some time and are unlikely
  to change in the near future.

- I also consider the ability to set the sane-k-bit flag on a per-volume
  basis to be desirable.  In part, this is because some sites will have
  large portions of the AFS filespace which have always had k-sane ACL's;
  for these, it will be easier to set the bit for each volume rather than
  for entire directories.  In addition, I expect it to be some time before
  we have a per-directory sane-k-bit flag, while a per-volume flag may be
  easier to implement and still provides some ability to set the flag
  differently for different parts of the filespace.

- I will not object to the addition of a per-server sane-k-bit flag,
  expressed as a fileserver capability bit, as a short-term measure until
  we can implement and deploy something with finer granularity.

In reviewing the text above prior to posting it, I noticed a new issue.  We
recently discovered that long-standing documentation and the fileserver
differed on what access control bit was required to obtain a write lock.
The documentation claimed that 'w' access was required (or 'i' access and
file ownership, as for writing) while the fileserver actually required 'k'.
This behavior was changed just prior to the creation of the OpenAFS 1.5.x
branch, so the fileserver now behaves in accordance with the documentation.

For fileservers with the new behavior, sanity of ACL's with respect to the
'k' bit is irrelevant for write locks, and clients should always back write
locks by obtaining a lock from the server.  However, for fileservers with
the old behavior, it may be more appropriate for clients to either treat
write locks in accordance with policy (5), or apply an analogous policy in
which the lock failure is ignored if the user does not have 'k' and the
sane-k-bit flag is not set.  The appropriate behavior is still open for
discussion, but in any event I think we should add a server capability flag
indicating that the server supports the new write-lock access-control
semantics, so that clients can select an appropriate policy.

-- Jeff