[OpenAFS-devel] linux-and-locks-cleanup-20070202 crashes linux kernels older than 2.6.17 (see RT #53457)

Christopher Allen Wing wingc@engin.umich.edu
Thu, 8 Feb 2007 12:18:01 -0500 (EST)


This is bug #53457 in RT.  I'm sending it to openafs-devel in case anyone 
who does not read the bug reports would be interested in commenting.



The recent CVS delta 'linux-and-locks-cleanup-20070202' fixes the bug 
introduced by 'linux-enroll-locks-20060403' (use of posix locks can cause 
a kernel panic), but it introduces a new bug on linux kernels older than 
2.6.17.

This is because the kernel API for flock_lock_file*() changed between 
2.6.16 and 2.6.17.  Here is the relevant change in the linux kernel source 
tree:

 	http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=993dfa8776308dcfd311cf77a3bbed4aa11e9868


Prior to this change (i.e. prior to linux-2.6.17), flock_lock_file*() 
takes as an argument a pointer to 'struct file_lock' which has been 
allocated by locks_alloc_lock().  It then threads this pointer into the 
inode's i_flock list.  The caller of flock_lock_file*() therefore may not 
free the 'file_lock' structure afterward.

In linux-2.6.17+, flock_lock_file*() takes as an argument a pointer to 
'struct file_lock', and copies the relevant data out.  It then allocates a 
new 'struct file_lock' and puts that into the inode i_flock list.  It does 
not retain a reference to the 'struct file_lock' pointer passed as an 
argument.



In the OpenAFS CVS delta 'linux-and-locks-cleanup-20070202' the following 
addition is made to src/afs/LINUX/osi_vnodeops.c:


 	+#ifdef STRUCT_FILE_OPERATIONS_HAS_FLOCK
 	+static int
 	+afs_linux_flock(struct file *fp, int cmd, struct file_lock *flp) {
 	...
 	...

 	+    AFS_GLOCK();
 	+    code = afs_lockctl(vcp, &flock, cmd, credp);
 	+    AFS_GUNLOCK();
 	+
 	+    if ((code == 0 || flp->fl_type == F_UNLCK) &&
 	+        (cmd == F_SETLK || cmd == F_SETLKW)) {
 	+       struct file_lock flp2;
 	+       flp2 = *flp;
 	+       flp2.fl_flags &=~ FL_SLEEP;
 	+       code = flock_lock_file_wait(fp, &flp2);


Note that the argument to flock_lock_file_wait() is the address of a 
'struct file_lock' allocated on the stack.  On linux kernels prior to 
2.6.17, this results in random stack memory getting into the inode's 
i_flock linked list.



The kernel will eventually crash when a future lock or unlock operation 
tries to traverse the i_flock list and finds random memory instead of a 
valid 'struct file_lock'.




This can probably be reproduced on any linux <2.6.17 kernel by using any 
program that calls flock() on a file in AFS.  I was able to crash a 
machine running the RHEL4 kernel like this:

 	cd /afs/somewhere
 	touch ./file

 	program_that_calls_flock()_and_waits ./file
 	program_that_calls_lockf()_and_waits ./file




There does not seem to be a good way to find out (e.g., autoconf test) if 
a particular linux kernel has the 'old' or 'new' semantics of 
flock_lock_file*(). The argument types of the functions have not changed.



One possible solution is to always allocate a new 'struct file_lock' and 
pass it to the kernel.  Prior to linux-2.6.17, the kernel uses the 
'list_empty(&file_lock->fl_link)' test to see if the lock was actually 
used; this should also be safe on newer kernels, but it's relying an an 
implementation detail.


--- openafs/src/afs/LINUX/osi_vnodeops.c~	2007-02-02 22:23:22.000000000 
-0500
+++ openafs/src/afs/LINUX/osi_vnodeops.c	2007-02-08 11:58:33.000000000 
-0500
@@ -528,6 +528,15 @@
      struct vcache *vcp = VTOAFS(FILE_INODE(fp));
      cred_t *credp = crref();
      struct AFS_FLOCK flock;
+    struct file_lock *flp2;
+
+    /* Allocate a new lock structure to give to the kernel */
+    flp2 = locks_alloc_lock();
+    if (!flp2) {
+	code = ENOMEM;
+	goto out;
+    }
+
      /* Convert to a lock format afs_lockctl understands. */
      memset((char *)&flock, 0, sizeof(flock));
      flock.l_type = flp->fl_type;
@@ -552,10 +561,9 @@

      if ((code == 0 || flp->fl_type == F_UNLCK) &&
          (cmd == F_SETLK || cmd == F_SETLKW)) {
-       struct file_lock flp2;
-       flp2 = *flp;
-       flp2.fl_flags &=~ FL_SLEEP;
-       code = flock_lock_file_wait(fp, &flp2);
+       *flp2 = *flp;
+       flp2->fl_flags &=~ FL_SLEEP;
+       code = flock_lock_file_wait(fp, flp2);
         if (code && flp->fl_type != F_UNLCK) {
             struct AFS_FLOCK flock2;
             flock2 = flock;
@@ -569,6 +577,12 @@
      flp->fl_type = flock.l_type;
      flp->fl_pid = flock.l_pid;

+    /* Free the newly allocated lock, if the kernel didn't use it */
+    if (list_empty(&flp2->fl_link)) {
+	locks_free_lock(flp2);
+    }
+
+out:
      crfree(credp);
      return -code;
  }



Alternatively, we can just pass the 'struct file_lock' handed to us by the 
kernel right back to flock_lock_file*().  It should be safe to clear the 
FL_SLEEP flag here; as far as I can tell, nothing in the kernel actually 
uses the FL_SLEEP flag in a file_lock structure once the structure is 
linked into the inode->i_flock list.  FL_SLEEP is only used when a new 
lock is being created.



--- openafs-fix/src/afs/LINUX/osi_vnodeops.c~	2007-02-02 22:23:22.000000000 
-0500
+++ openafs-fix/src/afs/LINUX/osi_vnodeops.c	2007-02-08 11:58:58.000000000 
-0500
@@ -552,10 +552,8 @@

      if ((code == 0 || flp->fl_type == F_UNLCK) &&
          (cmd == F_SETLK || cmd == F_SETLKW)) {
-       struct file_lock flp2;
-       flp2 = *flp;
-       flp2.fl_flags &=~ FL_SLEEP;
-       code = flock_lock_file_wait(fp, &flp2);
+       flp->fl_flags &=~ FL_SLEEP;
+       code = flock_lock_file_wait(fp, flp);
         if (code && flp->fl_type != F_UNLCK) {
             struct AFS_FLOCK flock2;
             flock2 = flock;



The third alternative solution would be to revert the addition of 
afs_linux_flock() for the time being.




Thanks,

Chris Wing
wingc@engin.umich.edu