[OpenAFS-devel] "vos move" stealthy volume unlock

Bren Mills mmills@qualcomm.com
Thu, 24 Apr 2008 15:00:51 -0700


This is a multi-part message in MIME format.
--------------020809080802020706060603
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

The "vos move" command appears to do the following things when it 
encounters a locked volume:

1) Report that the volume is locked. (Good.)
2) Abort attempt to move the volume. (Good.)
3) Silently unlock the volume. (Danger! Danger!)

Here is the demonstration:
########################################################################
mmills@tux:~$ vos lock Rglobal.qctesavl
Locked VLDB entry for volume Rglobal.qctesavl

mmills@tux:~$ vos exa Rglobal.qctesavl
Rglobal.qctesavl                 1819075242 RW      84899 K  On-line
     afs-l1.qualcomm.com /viceps
     RWrite 1819075242 ROnly          0 Backup          0
     MaxQuota    1000000 K
     Creation    Tue Apr 25 12:24:27 2006
     Copy        Wed Apr 23 14:46:16 2008
     Backup      Never
     Last Update Tue Apr 25 16:08:59 2006
     0 accesses in the past day (i.e., vnode references)

     RWrite: 1819075242    ROnly: 1819081294
     number of sites -> 3
        server afs-l1.qualcomm.com partition /viceps RW Site
        server afs-sd1.qualcomm.com partition /vicepk RO Site
        server afs-sd2.qualcomm.com partition /vicepk RO Site
     Volume is currently LOCKED

mmills@tux:~$ vos move Rglobal.qctesavl afs-l1 s afs-sd2 k

Could not lock entry for volume 1819075242
    VLDB: vldb entry is already locked

mmills@tux:~$ vos exa Rglobal.qctesavl
Rglobal.qctesavl                 1819075242 RW      84899 K  On-line
     afs-l1.qualcomm.com /viceps
     RWrite 1819075242 ROnly          0 Backup          0
     MaxQuota    1000000 K
     Creation    Tue Apr 25 12:24:27 2006
     Copy        Wed Apr 23 14:46:16 2008
     Backup      Never
     Last Update Tue Apr 25 16:08:59 2006
     0 accesses in the past day (i.e., vnode references)

     RWrite: 1819075242    ROnly: 1819081294
     number of sites -> 3
        server afs-l1.qualcomm.com partition /viceps RW Site
        server afs-sd1.qualcomm.com partition /vicepk RO Site
        server afs-sd2.qualcomm.com partition /vicepk RO Site

mmills@tux:~$
########################################################################

This behavior would seem to be a bug. I have reproduced the problem in 
the 1.5.35 code base. I have attached a patch for this behavior 
developed against that same code base. To apply the patch:

1) tar xvf openafs-1.5.35-src.tar.bz2
2) cd openafs-1.5.35
3) patch -p0 < vsprocs.c-patch

Then the usual ./configure && make && make install should work fine.

Here is the same demonstration as above, using the patched code:
########################################################################
mmills@mousey:/local/mnt/workspace/openafs/sbin$ ./vos lock Rglobal.qctesavl
Locked VLDB entry for volume Rglobal.qctesavl

mmills@mousey:/local/mnt/workspace/openafs/sbin$ ./vos exa Rglobal.qctesavl
Rglobal.qctesavl                 1819075242 RW      84899 K  On-line
     afs-l1.qualcomm.com /viceps
     RWrite 1819075242 ROnly          0 Backup          0
     MaxQuota    1000000 K
     Creation    Tue Apr 25 12:24:27 2006
     Copy        Wed Apr 23 14:46:16 2008
     Backup      Never
     Last Update Tue Apr 25 16:08:59 2006
     0 accesses in the past day (i.e., vnode references)

     RWrite: 1819075242    ROnly: 1819081294
     number of sites -> 3
        server afs-l1.qualcomm.com partition /viceps RW Site
        server afs-sd1.qualcomm.com partition /vicepk RO Site
        server afs-sd2.qualcomm.com partition /vicepk RO Site
     Volume is currently LOCKED

mmills@mousey:/local/mnt/workspace/openafs/sbin$ ./vos move 
Rglobal.qctesavl afs-l1 s afs-sd2 k

Could not lock entry for volume 1819075242
    VLDB: vldb entry is already locked

mmills@mousey:/local/mnt/workspace/openafs/sbin$ ./vos exa Rglobal.qctesavl
Rglobal.qctesavl                 1819075242 RW      84899 K  On-line
     afs-l1.qualcomm.com /viceps
     RWrite 1819075242 ROnly          0 Backup          0
     MaxQuota    1000000 K
     Creation    Tue Apr 25 12:24:27 2006
     Copy        Wed Apr 23 14:46:16 2008
     Backup      Never
     Last Update Tue Apr 25 16:08:59 2006
     0 accesses in the past day (i.e., vnode references)

     RWrite: 1819075242    ROnly: 1819081294
     number of sites -> 3
        server afs-l1.qualcomm.com partition /viceps RW Site
        server afs-sd1.qualcomm.com partition /vicepk RO Site
        server afs-sd2.qualcomm.com partition /vicepk RO Site
     Volume is currently LOCKED

mmills@mousey:/local/mnt/workspace/openafs/sbin$
########################################################################

After the move fails due to a locked volume (in src/volser/vsprocs.c 
UV_MoveVolume2) it goes to an overly generalized clean up (circa line 
1998) that includes a volume unlock.

If the volume has been locked by the current request then I believe the 
islocked flag is set to true. When going to unlock the volume in the 
clean up code, there is no check of islocked, thus clobbering a volume 
that was locked, but not by the current request. The patch simply adds 
this "if (islocked)" check.

My testing did not stray beyond the normal daily vos commands (examine, 
lock, unlock, and move) and thus this should be looked over by another 
pair of eyes.

My apologies for not sending this through the bug tracking page, but 
apparently "Britney is lesbian" and I am in desperate need of "Rolex 
watches as low as $229". :(

I hope this helps.

Bren Mills
Unix Sysadmin
Qualcomm Inc.

--------------020809080802020706060603
Content-Type: text/plain;
 name="vsprocs.c-patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="vsprocs.c-patch"

--- ../openafs-1.5.35/src/volser/vsprocs.c	2008-04-14 13:25:52.000000000 -0700
+++ src/volser/vsprocs.c	2008-04-24 13:48:57.000000000 -0700
@@ -1994,13 +1994,14 @@
 	}
     }
 
-    /* unlock VLDB entry */
-    VPRINT1("Recovery: Releasing lock on VLDB entry for volume %u ...",
-	    afromvol);
-    ubik_VL_ReleaseLock(cstruct, 0, afromvol, -1,
-	      (LOCKREL_OPCODE | LOCKREL_AFSID | LOCKREL_TIMESTAMP));
-    VDONE;
-
+    if (islocked){
+        /* unlock VLDB entry */
+        VPRINT1("Recovery: Releasing lock on VLDB entry for volume %u ...",
+        	    afromvol);
+        ubik_VL_ReleaseLock(cstruct, 0, afromvol, -1,
+	          (LOCKREL_OPCODE | LOCKREL_AFSID | LOCKREL_TIMESTAMP));
+        VDONE;
+    }
   done:			/* routine cleanup */
     if (volName)
 	free(volName);

--------------020809080802020706060603--