[OpenAFS-devel] "vos move" stealthy volume unlock
Bren Mills
mmills@qualcomm.com
Thu, 24 Apr 2008 15:00:51 -0700
This is a multi-part message in MIME format.
--------------020809080802020706060603
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
The "vos move" command appears to do the following things when it
encounters a locked volume:
1) Report that the volume is locked. (Good.)
2) Abort attempt to move the volume. (Good.)
3) Silently unlock the volume. (Danger! Danger!)
Here is the demonstration:
########################################################################
mmills@tux:~$ vos lock Rglobal.qctesavl
Locked VLDB entry for volume Rglobal.qctesavl
mmills@tux:~$ vos exa Rglobal.qctesavl
Rglobal.qctesavl 1819075242 RW 84899 K On-line
afs-l1.qualcomm.com /viceps
RWrite 1819075242 ROnly 0 Backup 0
MaxQuota 1000000 K
Creation Tue Apr 25 12:24:27 2006
Copy Wed Apr 23 14:46:16 2008
Backup Never
Last Update Tue Apr 25 16:08:59 2006
0 accesses in the past day (i.e., vnode references)
RWrite: 1819075242 ROnly: 1819081294
number of sites -> 3
server afs-l1.qualcomm.com partition /viceps RW Site
server afs-sd1.qualcomm.com partition /vicepk RO Site
server afs-sd2.qualcomm.com partition /vicepk RO Site
Volume is currently LOCKED
mmills@tux:~$ vos move Rglobal.qctesavl afs-l1 s afs-sd2 k
Could not lock entry for volume 1819075242
VLDB: vldb entry is already locked
mmills@tux:~$ vos exa Rglobal.qctesavl
Rglobal.qctesavl 1819075242 RW 84899 K On-line
afs-l1.qualcomm.com /viceps
RWrite 1819075242 ROnly 0 Backup 0
MaxQuota 1000000 K
Creation Tue Apr 25 12:24:27 2006
Copy Wed Apr 23 14:46:16 2008
Backup Never
Last Update Tue Apr 25 16:08:59 2006
0 accesses in the past day (i.e., vnode references)
RWrite: 1819075242 ROnly: 1819081294
number of sites -> 3
server afs-l1.qualcomm.com partition /viceps RW Site
server afs-sd1.qualcomm.com partition /vicepk RO Site
server afs-sd2.qualcomm.com partition /vicepk RO Site
mmills@tux:~$
########################################################################
This behavior would seem to be a bug. I have reproduced the problem in
the 1.5.35 code base. I have attached a patch for this behavior
developed against that same code base. To apply the patch:
1) tar xvf openafs-1.5.35-src.tar.bz2
2) cd openafs-1.5.35
3) patch -p0 < vsprocs.c-patch
Then the usual ./configure && make && make install should work fine.
Here is the same demonstration as above, using the patched code:
########################################################################
mmills@mousey:/local/mnt/workspace/openafs/sbin$ ./vos lock Rglobal.qctesavl
Locked VLDB entry for volume Rglobal.qctesavl
mmills@mousey:/local/mnt/workspace/openafs/sbin$ ./vos exa Rglobal.qctesavl
Rglobal.qctesavl 1819075242 RW 84899 K On-line
afs-l1.qualcomm.com /viceps
RWrite 1819075242 ROnly 0 Backup 0
MaxQuota 1000000 K
Creation Tue Apr 25 12:24:27 2006
Copy Wed Apr 23 14:46:16 2008
Backup Never
Last Update Tue Apr 25 16:08:59 2006
0 accesses in the past day (i.e., vnode references)
RWrite: 1819075242 ROnly: 1819081294
number of sites -> 3
server afs-l1.qualcomm.com partition /viceps RW Site
server afs-sd1.qualcomm.com partition /vicepk RO Site
server afs-sd2.qualcomm.com partition /vicepk RO Site
Volume is currently LOCKED
mmills@mousey:/local/mnt/workspace/openafs/sbin$ ./vos move
Rglobal.qctesavl afs-l1 s afs-sd2 k
Could not lock entry for volume 1819075242
VLDB: vldb entry is already locked
mmills@mousey:/local/mnt/workspace/openafs/sbin$ ./vos exa Rglobal.qctesavl
Rglobal.qctesavl 1819075242 RW 84899 K On-line
afs-l1.qualcomm.com /viceps
RWrite 1819075242 ROnly 0 Backup 0
MaxQuota 1000000 K
Creation Tue Apr 25 12:24:27 2006
Copy Wed Apr 23 14:46:16 2008
Backup Never
Last Update Tue Apr 25 16:08:59 2006
0 accesses in the past day (i.e., vnode references)
RWrite: 1819075242 ROnly: 1819081294
number of sites -> 3
server afs-l1.qualcomm.com partition /viceps RW Site
server afs-sd1.qualcomm.com partition /vicepk RO Site
server afs-sd2.qualcomm.com partition /vicepk RO Site
Volume is currently LOCKED
mmills@mousey:/local/mnt/workspace/openafs/sbin$
########################################################################
After the move fails due to a locked volume (in src/volser/vsprocs.c
UV_MoveVolume2) it goes to an overly generalized clean up (circa line
1998) that includes a volume unlock.
If the volume has been locked by the current request then I believe the
islocked flag is set to true. When going to unlock the volume in the
clean up code, there is no check of islocked, thus clobbering a volume
that was locked, but not by the current request. The patch simply adds
this "if (islocked)" check.
My testing did not stray beyond the normal daily vos commands (examine,
lock, unlock, and move) and thus this should be looked over by another
pair of eyes.
My apologies for not sending this through the bug tracking page, but
apparently "Britney is lesbian" and I am in desperate need of "Rolex
watches as low as $229". :(
I hope this helps.
Bren Mills
Unix Sysadmin
Qualcomm Inc.
--------------020809080802020706060603
Content-Type: text/plain;
name="vsprocs.c-patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="vsprocs.c-patch"
--- ../openafs-1.5.35/src/volser/vsprocs.c 2008-04-14 13:25:52.000000000 -0700
+++ src/volser/vsprocs.c 2008-04-24 13:48:57.000000000 -0700
@@ -1994,13 +1994,14 @@
}
}
- /* unlock VLDB entry */
- VPRINT1("Recovery: Releasing lock on VLDB entry for volume %u ...",
- afromvol);
- ubik_VL_ReleaseLock(cstruct, 0, afromvol, -1,
- (LOCKREL_OPCODE | LOCKREL_AFSID | LOCKREL_TIMESTAMP));
- VDONE;
-
+ if (islocked){
+ /* unlock VLDB entry */
+ VPRINT1("Recovery: Releasing lock on VLDB entry for volume %u ...",
+ afromvol);
+ ubik_VL_ReleaseLock(cstruct, 0, afromvol, -1,
+ (LOCKREL_OPCODE | LOCKREL_AFSID | LOCKREL_TIMESTAMP));
+ VDONE;
+ }
done: /* routine cleanup */
if (volName)
free(volName);
--------------020809080802020706060603--