[OpenAFS] File locking
Todd Lewis
Todd_Lewis@unc.edu
Thu, 17 Jul 2014 10:07:07 -0400
We have a commercial application we've been running for years on an
openafs-1.4.14 client on a RedHat Linux box. This week we upgraded the
client to 1.6.9, but quickly had to revert. The difference has to do with
file locking. The section of strace output below shows the behavior in the
old client:
> open("/afs/[OurCell]/pkg/[application_path]/msgdir.msg", O_RDONLY) = 8
> fcntl(8, F_SETFD, FD_CLOEXEC) = 0
> fcntl(8, F_GETLK, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0, pid=0}) = 0
> fcntl(8, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}) = 0
after which the application proceeds with total succss.
The corresponding section of the 1.6.9 client's strace of the same
application on the same machine shows:
> open("/afs/[OurCell]/pkg/[application_path]/msgdir.msg", O_RDONLY) = 8
> fcntl(8, F_SETFD, FD_CLOEXEC) = 0
> fcntl(8, F_GETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0, pid=0}) = 9
> close(8) = 0
> write(1, "\n", 1) = 1
> write(1, "UNABLE TO OPEN/READ MESSAGE FILE"..., 53) = 53
After the somewhat misleading message, the application closes. The
difference starts in the 3rd line, where the old client reports no extant
locks on the file, but the 1.6.9 client reports an existing exclusive lock.
For what it's worth, other 1.6.9 clients exhibit the same behavior, so
it's not like this one client had an outstanding lock of the file.
Was there some change in file locking semantics that would make sense of
this? Does this application tickle a corner case error in openafs's file
locking, or does more rigorous lock handling in the newer client expose a
bug in the application? I have tried to replicate the failure with a very
simple C program with no success, but that mostly indicates I'm not sure
what I'm testing for.
Thoughts? Suggestions? Thanks,
--
Todd_Lewis@unc.edu