[OpenAFS-devel] linux-and-locks-cleanup-20070202 crashes linux kernels older than 2.6.17 (see RT #53457)

Thu, 08 Feb 2007 14:53:42 -0500

Christopher Allen Wing <wingc@engin.umich.edu> writes:
...
> There does not seem to be a good way to find out (e.g., autoconf test) if 
> a particular linux kernel has the 'old' or 'new' semantics of 
> flock_lock_file*(). The argument types of the functions have not changed.
...

This sounds like it's a feature that is almost exactly
tied to linux kernel version.  Testing the Linux kernel
version is *very* easy to do at compile time:

	mdw@bruson:~/src/linux-2.6.18$ cat include/linux/version.h
	#define LINUX_VERSION_CODE 132626
	#define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))
	mdw@bruson:~/src/linux-2.6.18$ dc
	132626 16o p 
	20612
	mdw@bruson:~/src/linux-2.6.18$ 

which is to say:
	#include <linux/version.h>
	...
	#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,17)

should suffice to determine if the new locking semantics
are true.

Autoconf tests are most useful for features that might exist
on several different platforms.  Run-time tests are useful
for debugging (#ifdef DEBUG) or if the vendor(s) *really* can't
be relied upon to provide predictable functionality.

Although that will fix the kernel api problem, it may not completely
fix flock behavior.  When Matt and I last looked at this
(very briefly) we couldn't convince ourselves that this would do the
right thing in all cases - we couldn't figure out looking at the
above code how it was supposed to wait for local locks.
Turning off FL_SLEEP turns off that wait, so it seems
like the only wait possible is the one for the whole
file lock from the server.

This probably needs a much more thorough examination
(ie, test cases, exercising various combinations of
whole file vs. byte ranges, local locking vs. remote locking,
possible deadlocks, signals, etc.) to see that it does "reasonable"
things in all cases.  In particular, if this patch is in fact
useful, it should be possible to reproduce the "non-overlapped locks
don't block, overlapped locks do block" case when all locks
are being obtained from one linux machine.

I believe Matt is planning to produce a much more extensive
"portable" version of locking that should work much like the
current linux code is supposed to work (ie, local byte range locks
layered on top of fileserver whole file locks) that should work for
most unix(-like) platforms.

Nice bit of detective work, by the way.

					-Marcus Watts