[OpenAFS-devel] linux-and-locks-cleanup-20070202 crashes linux kernels older than
2.6.17 (see RT #53457)
Christopher Allen Wing
wingc@engin.umich.edu
Thu, 8 Feb 2007 12:18:01 -0500 (EST)
This is bug #53457 in RT. I'm sending it to openafs-devel in case anyone
who does not read the bug reports would be interested in commenting.
The recent CVS delta 'linux-and-locks-cleanup-20070202' fixes the bug
introduced by 'linux-enroll-locks-20060403' (use of posix locks can cause
a kernel panic), but it introduces a new bug on linux kernels older than
2.6.17.
This is because the kernel API for flock_lock_file*() changed between
2.6.16 and 2.6.17. Here is the relevant change in the linux kernel source
tree:
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=993dfa8776308dcfd311cf77a3bbed4aa11e9868
Prior to this change (i.e. prior to linux-2.6.17), flock_lock_file*()
takes as an argument a pointer to 'struct file_lock' which has been
allocated by locks_alloc_lock(). It then threads this pointer into the
inode's i_flock list. The caller of flock_lock_file*() therefore may not
free the 'file_lock' structure afterward.
In linux-2.6.17+, flock_lock_file*() takes as an argument a pointer to
'struct file_lock', and copies the relevant data out. It then allocates a
new 'struct file_lock' and puts that into the inode i_flock list. It does
not retain a reference to the 'struct file_lock' pointer passed as an
argument.
In the OpenAFS CVS delta 'linux-and-locks-cleanup-20070202' the following
addition is made to src/afs/LINUX/osi_vnodeops.c:
+#ifdef STRUCT_FILE_OPERATIONS_HAS_FLOCK
+static int
+afs_linux_flock(struct file *fp, int cmd, struct file_lock *flp) {
...
...
+ AFS_GLOCK();
+ code = afs_lockctl(vcp, &flock, cmd, credp);
+ AFS_GUNLOCK();
+
+ if ((code == 0 || flp->fl_type == F_UNLCK) &&
+ (cmd == F_SETLK || cmd == F_SETLKW)) {
+ struct file_lock flp2;
+ flp2 = *flp;
+ flp2.fl_flags &=~ FL_SLEEP;
+ code = flock_lock_file_wait(fp, &flp2);
Note that the argument to flock_lock_file_wait() is the address of a
'struct file_lock' allocated on the stack. On linux kernels prior to
2.6.17, this results in random stack memory getting into the inode's
i_flock linked list.
The kernel will eventually crash when a future lock or unlock operation
tries to traverse the i_flock list and finds random memory instead of a
valid 'struct file_lock'.
This can probably be reproduced on any linux <2.6.17 kernel by using any
program that calls flock() on a file in AFS. I was able to crash a
machine running the RHEL4 kernel like this:
cd /afs/somewhere
touch ./file
program_that_calls_flock()_and_waits ./file
program_that_calls_lockf()_and_waits ./file
There does not seem to be a good way to find out (e.g., autoconf test) if
a particular linux kernel has the 'old' or 'new' semantics of
flock_lock_file*(). The argument types of the functions have not changed.
One possible solution is to always allocate a new 'struct file_lock' and
pass it to the kernel. Prior to linux-2.6.17, the kernel uses the
'list_empty(&file_lock->fl_link)' test to see if the lock was actually
used; this should also be safe on newer kernels, but it's relying an an
implementation detail.
--- openafs/src/afs/LINUX/osi_vnodeops.c~ 2007-02-02 22:23:22.000000000
-0500
+++ openafs/src/afs/LINUX/osi_vnodeops.c 2007-02-08 11:58:33.000000000
-0500
@@ -528,6 +528,15 @@
struct vcache *vcp = VTOAFS(FILE_INODE(fp));
cred_t *credp = crref();
struct AFS_FLOCK flock;
+ struct file_lock *flp2;
+
+ /* Allocate a new lock structure to give to the kernel */
+ flp2 = locks_alloc_lock();
+ if (!flp2) {
+ code = ENOMEM;
+ goto out;
+ }
+
/* Convert to a lock format afs_lockctl understands. */
memset((char *)&flock, 0, sizeof(flock));
flock.l_type = flp->fl_type;
@@ -552,10 +561,9 @@
if ((code == 0 || flp->fl_type == F_UNLCK) &&
(cmd == F_SETLK || cmd == F_SETLKW)) {
- struct file_lock flp2;
- flp2 = *flp;
- flp2.fl_flags &=~ FL_SLEEP;
- code = flock_lock_file_wait(fp, &flp2);
+ *flp2 = *flp;
+ flp2->fl_flags &=~ FL_SLEEP;
+ code = flock_lock_file_wait(fp, flp2);
if (code && flp->fl_type != F_UNLCK) {
struct AFS_FLOCK flock2;
flock2 = flock;
@@ -569,6 +577,12 @@
flp->fl_type = flock.l_type;
flp->fl_pid = flock.l_pid;
+ /* Free the newly allocated lock, if the kernel didn't use it */
+ if (list_empty(&flp2->fl_link)) {
+ locks_free_lock(flp2);
+ }
+
+out:
crfree(credp);
return -code;
}
Alternatively, we can just pass the 'struct file_lock' handed to us by the
kernel right back to flock_lock_file*(). It should be safe to clear the
FL_SLEEP flag here; as far as I can tell, nothing in the kernel actually
uses the FL_SLEEP flag in a file_lock structure once the structure is
linked into the inode->i_flock list. FL_SLEEP is only used when a new
lock is being created.
--- openafs-fix/src/afs/LINUX/osi_vnodeops.c~ 2007-02-02 22:23:22.000000000
-0500
+++ openafs-fix/src/afs/LINUX/osi_vnodeops.c 2007-02-08 11:58:58.000000000
-0500
@@ -552,10 +552,8 @@
if ((code == 0 || flp->fl_type == F_UNLCK) &&
(cmd == F_SETLK || cmd == F_SETLKW)) {
- struct file_lock flp2;
- flp2 = *flp;
- flp2.fl_flags &=~ FL_SLEEP;
- code = flock_lock_file_wait(fp, &flp2);
+ flp->fl_flags &=~ FL_SLEEP;
+ code = flock_lock_file_wait(fp, flp);
if (code && flp->fl_type != F_UNLCK) {
struct AFS_FLOCK flock2;
flock2 = flock;
The third alternative solution would be to revert the addition of
afs_linux_flock() for the time being.
Thanks,
Chris Wing
wingc@engin.umich.edu