[OpenAFS] afs semantics
Sat, 10 Jun 2006 21:37:06 -0400
On Saturday, June 10, 2006 07:40:25 AM -0400 Jeffrey Altman
> Adam Megacz wrote:
>> The people who write darcs (an incredibly powerful/flexible version
>> control system) are looking into making sure that it works properly on
>> AFS, and were looking for an authoritative, official statement of
>> exactly how AFS file semantics differ from UNIX semantics:
>> Specifically, can anybody comment on these points?
>> 1. If two processes on different clients both attempt to
>> open(O_CREAT|O_EXCL), does AFS guarantee that no more than one of
>> them will succeed?
> It should. There have been recent reports that this may not be true
> on some platforms either because of a bug. However, insufficient data
> has been collected to determine if in fact this is a bug. If it is a
> bug I suspect it is a bug in the client on some bug not all platforms.
Yes; modulo bugs, two simultaneous exclusive creates of the same name in
the same directory will not both succeed. Actually, I think the bug to
which Jeff is referring is not a violation of this guarantee -- it results
in _neither_ of the creates succeeding.
>> 2. If two processes both attempt to rename() the same [source] file,
>> does AFS guarantee that exactly one of them succeeds?
> It should. Again, if this is not true it would be a bug in the client.
Well, it guarantees that _at most_ one succeeds. Of course, it is possible
for both operations to fail for reasons having nothing to do with the race.
However, assuming one succeeds, the other should fail with ENOENT.
Note that this only works when renaming a file within a volume, and only if
the rename would not result in a single file having links in more than one
directory. A rename() call that would violate these constraints will
instead return EXDEV.
>> 3. If client "A" makes two inode-level changes (creat, remove,
>> rename, etc), is it ever possible for client "B" to see the
>> second change before the first one?
> Not possible. AFS does not distribute changes to clients. It simply
> notifies clients that the known state of the object has changed. The
> client could find out about the first change or the combination of the
> first and second changes, but never the second and not the first.
I'm not sure what you mean here by "inode-level"; the system calls
mentioned are characterized by the fact that they result in changes to
directory contents. For operations with this property, and with respect to
a single directory, Jeff's analysis is correct. Changes to the
authoritative copy of a directory are always performed at the fileserver,
never by clients, and these changes are serialized. Clients never receive
partial directory updates from the fileserver; if a directory changes, the
client must fetch a complete new copy of the directory. This fetch is
always done in a single RPC, so the client always recieves a complete,
self-consistent copy of the directory. If a single client makes two
changes in some particular order, other clients will always see the changes
in the same order, because no version of the directory ever existed which
contains only the second change.
Note that this guarantee applies only with respect to any one directory.
For changes to multiple directories, a considerably more complex analysis
is required, and depending on the situation, changes might become visible
to other clients in the "wrong" order.
> It is possible to test whether or not a path is located within AFS by
> using "fs whichcell <path>". If it returns successfully, you have an
> AFS path. If not, then not. Darcs might want to test this to determine
> whether or not alternative behaviors should be used.
Of course, it's also possible to do this using the corresponding pioctl, if
you are willing to grow a dependency on AFS libraries or libkafs.
Some more direct responses to the questions Juliusz is actually asking:
> Thanks, although I'd prefer authoritative docs to a FAQ entry.
There is no authoritative documentation at this level, and there never has
been. The FAQ is the closest you're going to get, but if you ask precise
questions on openafs-devel, you're likely to get authoritative answers.
Note that the afs3-standardization list is about the AFS 3 _protocol_, and
in fact is primarily about extending that protocol and resolving
ambiguities in a consistent way, so as to maintain interoperability. While
a complete protocol specification would be nice, writing it does not seem
to be high on anyone's to-do list. What this list is explicitly _not_
about is defining the behavior and semantics of any particular
implementation, including OpenAFS. So, the semantics of the UNIX system
call interface with respect to AFS are out of scope.
>> Hard links: [ User ]
>> In AFS, hard links (eg: ln old new) are only valid within a
> This will definitely break ``darcs get'' and ``optimize --relink''
> (anything else?). We can work around the issue, but you'll have to
> tell us in what way link(2) fails when the above constraint is
Attempts to create hard links between files in different AFS directories
will fail with EXDEV. For the case of links in different volumes, this
check is done early, in the client (though of course, it also fails if the
client fails to perform the check). For attempts to create a link in a
different directory in the same volume, the check is done fairly late, and
so you're more likely to get errors like EACCES or EISDIR, if those apply.
Note that you can rename a file from one directory to another within the
same volume, as long as the file does not have more than one existing link.
Attempts to rename a file with multiple links, or to rename a file into a
different volume, will fail with EXDEV.
> - how does open(O_CREAT | O_EXCL) work?
It works as advertised - if the specified file already exists, the
operation fails with EEXIST.
> - is link(2) consistent w.r.t. link and open?
> - is rename(2) consistent w.r.t. rename and open?
I'm not sure what is meant here by "consistent". Where I come from, an
operation that is "consistent" is one that never transitions from a valid
state to an invalid one. All of open, rename, and link have this property,
and all of them obey the same rules with respect to what are considered
valid states of the filesystem.
> AFS does not support byte-range locking within a file,
> although lockf() and fcntl() calls will return 0 (success).
> This is careless. I fully agree that SVR4-style locks are brain-
> damaged beyond hope, but fcntl(2) over AFS should fail with ENOSYS
> rather than returning success!
We try fairly hard to support whatever locking interfaces are available on
any given platform. Regardless of the interface used, AFS supports
whole-file locking, both between processes on the same client system and
between client systems. It does not implement partial-file locking at all,
because applications which actually _rely_ on fine-grained locking also
tend to rely on such locks to act as fine-grained data consistency
barriers, and such semantics would be quite difficult for AFS to support
> Adam, I really need authoritative documentation on
> - consistency properties of AFS;
> - restrictions of Unix system calls on AFS.
There is no complete, authoritative documentation on these issues. As I
mentioned above, someone asking specific, well-defined questions on the
openafs-devel list (firstname.lastname@example.org) would be likely to get
-- Jeffrey T. Hutzelman (N3NHS) <email@example.com>
Sr. Research Systems Programmer
School of Computer Science - Research Computing Facility
Carnegie Mellon University - Pittsburgh, PA