[OpenAFS-devel] linux-and-locks-cleanup-20070202 crashes linux
kernels older than 2.6.17 (see RT #53457)
Christopher Allen Wing
wingc@engin.umich.edu
Thu, 8 Feb 2007 15:05:37 -0500 (EST)
Marcus:
On Thu, 8 Feb 2007, Marcus Watts wrote:
> Christopher Allen Wing <wingc@engin.umich.edu> writes:
> ...
>> There does not seem to be a good way to find out (e.g., autoconf test) if
>> a particular linux kernel has the 'old' or 'new' semantics of
>> flock_lock_file*(). The argument types of the functions have not changed.
> ...
>
> This sounds like it's a feature that is almost exactly
> tied to linux kernel version. Testing the Linux kernel
> version is *very* easy to do at compile time:
>
> mdw@bruson:~/src/linux-2.6.18$ cat include/linux/version.h
> #define LINUX_VERSION_CODE 132626
> #define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))
> mdw@bruson:~/src/linux-2.6.18$ dc
> 132626 16o p
> 20612
> mdw@bruson:~/src/linux-2.6.18$
That isn't safe in general, because linux vendors may backport patches to
older kernel versions. I wouldn't trust it to be correct unless you could
guarantee you are using a vanilla linux kernel.
> Although that will fix the kernel api problem, it may not completely
> fix flock behavior. When Matt and I last looked at this
> (very briefly) we couldn't convince ourselves that this would do the
> right thing in all cases - we couldn't figure out looking at the
> above code how it was supposed to wait for local locks.
> Turning off FL_SLEEP turns off that wait, so it seems
> like the only wait possible is the one for the whole
> file lock from the server.
>
> This probably needs a much more thorough examination
> (ie, test cases, exercising various combinations of
> whole file vs. byte ranges, local locking vs. remote locking,
> possible deadlocks, signals, etc.) to see that it does "reasonable"
> things in all cases. In particular, if this patch is in fact
> useful, it should be possible to reproduce the "non-overlapped locks
> don't block, overlapped locks do block" case when all locks
> are being obtained from one linux machine.
I think the current semantics (as of 1.4.2+) are a mishmash of traditional
AFS locking plus partial local locking in some cases. It's not
consistent.
> I believe Matt is planning to produce a much more extensive
> "portable" version of locking that should work much like the
> current linux code is supposed to work (ie, local byte range locks
> layered on top of fileserver whole file locks) that should work for
> most unix(-like) platforms.
I think it would be possible to do something like this on linux by
primarily using the local linux locking code, and having a helper function
that attempts to change the lock state on the AFS server to:
single read lock
single write lock
no lock
upon request. But it seems there aren't reliable means to do this.
There is no race-free way to transition between a read lock and write lock
and vice versa. If there is an extended network failure any lock on the
server will time out and then, potentially, local locks might remain in
place- do we then have to call back into the kernel and kill all the local
locks?
I don't know what type of kernel APIs are available on other types of
unix. It would be nice if we didn't have to rewrite an entire posix file
locking layer in openafs, but rather, re-use kernel APIs where possible.
> Nice bit of detective work, by the way.
Thanks. I patched some production machines with
linux-and-locks-cleanup-20070202, to fix an infrequent problem, and then
they crashed hard due to the new bug :(
-Chris
wingc@engin.umich.edu