[OpenAFS-devel] linux-and-locks-cleanup-20070202 crashes linux kernels older than 2.6.17 (see RT #53457)

Thu, 8 Feb 2007 15:05:37 -0500 (EST)

Marcus:

On Thu, 8 Feb 2007, Marcus Watts wrote:

> Christopher Allen Wing <wingc@engin.umich.edu> writes:
> ...
>> There does not seem to be a good way to find out (e.g., autoconf test) if
>> a particular linux kernel has the 'old' or 'new' semantics of
>> flock_lock_file*(). The argument types of the functions have not changed.
> ...
>
> This sounds like it's a feature that is almost exactly
> tied to linux kernel version.  Testing the Linux kernel
> version is *very* easy to do at compile time:
>
> 	mdw@bruson:~/src/linux-2.6.18$ cat include/linux/version.h
> 	#define LINUX_VERSION_CODE 132626
> 	#define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))
> 	mdw@bruson:~/src/linux-2.6.18$ dc
> 	132626 16o p
> 	20612
> 	mdw@bruson:~/src/linux-2.6.18$

That isn't safe in general, because linux vendors may backport patches to 
older kernel versions.  I wouldn't trust it to be correct unless you could 
guarantee you are using a vanilla linux kernel.

> Although that will fix the kernel api problem, it may not completely
> fix flock behavior.  When Matt and I last looked at this
> (very briefly) we couldn't convince ourselves that this would do the
> right thing in all cases - we couldn't figure out looking at the
> above code how it was supposed to wait for local locks.
> Turning off FL_SLEEP turns off that wait, so it seems
> like the only wait possible is the one for the whole
> file lock from the server.
>
> This probably needs a much more thorough examination
> (ie, test cases, exercising various combinations of
> whole file vs. byte ranges, local locking vs. remote locking,
> possible deadlocks, signals, etc.) to see that it does "reasonable"
> things in all cases.  In particular, if this patch is in fact
> useful, it should be possible to reproduce the "non-overlapped locks
> don't block, overlapped locks do block" case when all locks
> are being obtained from one linux machine.

I think the current semantics (as of 1.4.2+) are a mishmash of traditional 
AFS locking plus partial local locking in some cases.  It's not 
consistent.

> I believe Matt is planning to produce a much more extensive
> "portable" version of locking that should work much like the
> current linux code is supposed to work (ie, local byte range locks
> layered on top of fileserver whole file locks) that should work for
> most unix(-like) platforms.

I think it would be possible to do something like this on linux by 
primarily using the local linux locking code, and having a helper function 
that attempts to change the lock state on the AFS server to:

 	single read lock
 	single write lock
 	no lock

upon request.  But it seems there aren't reliable means to do this. 
There is no race-free way to transition between a read lock and write lock 
and vice versa.  If there is an extended network failure any lock on the 
server will time out and then, potentially, local locks might remain in 
place- do we then have to call back into the kernel and kill all the local 
locks?

I don't know what type of kernel APIs are available on other types of 
unix.  It would be nice if we didn't have to rewrite an entire posix file 
locking layer in openafs, but rather, re-use kernel APIs where possible.

> Nice bit of detective work, by the way.

Thanks.  I patched some production machines with 
linux-and-locks-cleanup-20070202, to fix an infrequent problem, and then 
they crashed hard due to the new bug :(

-Chris
wingc@engin.umich.edu