[OpenAFS] namei interface lockf buggy on Solaris (and probably HP-UX and AIX)

Tom Keiser tkeiser@gmail.com
Mon, 11 Sep 2006 12:45:40 -0400


I propose we move this discussion to -devel.

On 9/11/06, Rainer Toebbicke <rtb@pclella.cern.ch> wrote:
> The namei interface uses file locking extensively, implemented using
> lockf() on Solaris, AIX & HP-UX.
>
> Unfortunately lockf() locks and unlocks from the *current position* to
> whatever the argument says (end of file), moving the file pointer in
> between becomes a problem for the subsequent unlock!  The result is
> that frequently locks aren't released, but replaced by partial locks
> on the file data just moved over.

At least on AIX and Solaris, lockf() is nothing more than an
inflexible wrapper around fcntl() byte-range locks.  My vote is to
transition to fcntl (where we can explicitly pass in a base offset and
length).  This eliminates the call semantics change introduced by your
patch, and eliminates the unnecessary syscall overhead.  I further
object because I'm working on a patch which will allow us to use
pread/pwrite on platforms which support it.  This will completely
eliminate fcntl(F_DUPFD,...) and lseek() overhead in the fd package,
so any new requirements on lseek could mitigate the performance
improvement I'm seeing.  However, the real motivation for switching to
pread/pwrite is due to a fairly serious locking bug:

As it turns out, the way we use file locks in the volume package is
quite broken.  The spec says that once a process closes *any* file
descriptor, all fcntl locks held for that file are immediately
destroyed.  This means that the pthread fileserver/volserver can have
some interesting races given how the ih package fd cache allows
multiple concurrent descriptors per inode handle.  I have sample code
sitting around somewhere which demonstrates this fault.

Regards,

-- 
Tom Keiser
tkeiser@gmail.com