[OpenAFS] File locking

Todd Lewis Todd_Lewis@unc.edu
Thu, 17 Jul 2014 10:07:07 -0400


We have a commercial application we've been running for years on an 
openafs-1.4.14 client on a RedHat Linux box. This week we upgraded the 
client to 1.6.9, but quickly had to revert. The difference has to do with 
file locking. The section of strace output below shows the behavior in the 
old client:
> open("/afs/[OurCell]/pkg/[application_path]/msgdir.msg", O_RDONLY) = 8
> fcntl(8, F_SETFD, FD_CLOEXEC)     = 0
> fcntl(8, F_GETLK, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0, pid=0}) = 0
> fcntl(8, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}) = 0
after which the application proceeds with total succss.

The corresponding section of the 1.6.9 client's strace of the same 
application on the same machine shows:
> open("/afs/[OurCell]/pkg/[application_path]/msgdir.msg", O_RDONLY) = 8
> fcntl(8, F_SETFD, FD_CLOEXEC)     = 0
> fcntl(8, F_GETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0, pid=0}) = 9
> close(8)                          = 0
> write(1, "\n", 1)                 = 1
> write(1, "UNABLE TO OPEN/READ MESSAGE FILE"..., 53) = 53
After the somewhat misleading message, the application closes. The 
difference starts in the 3rd line, where the old client reports no extant 
locks on the file, but the 1.6.9 client reports an existing exclusive lock.

For what it's worth, other 1.6.9 clients exhibit the same behavior, so 
it's not like this one client had an outstanding lock of the file.

Was there some change in file locking semantics that would make sense of 
this? Does this application tickle a corner case error in openafs's file 
locking, or does more rigorous lock handling in the newer client expose a 
bug in the application? I have tried to replicate the failure with a very 
simple C program with no success, but that mostly indicates I'm not sure 
what I'm testing for.

Thoughts? Suggestions? Thanks,
--
Todd_Lewis@unc.edu