[OpenAFS] can neither delete nor use backup volume (DA server 1.6.0pre2)
John Tang Boyland
boyland@cs.uwm.edu
Mon, 29 Aug 2011 14:18:14 -0500
I accidentally moved a backed-up volume from partition a to b.
I moved it back, but now the backup volume is misbehaving.
(The process is REPEATABLE!)
I can't seem to remove all traces of it, but neither can I use it.
I can get root access on the (Linux) file server and start hacking files
directly, but perhaps someone on this list can help me avoid
making a terrible blunder.
Here's a partial history. Notice that after the "zap", vos backup gives
a different error once, but then it's back to the same set of errors
again. I've been able to create backups of other volumes on the same
partition today. So it seems that vos zap doesn't actually zap
everything. Something's hanging around which is preventing vos backup
from working. (And no, salvaging doesn't help either.)
Any advice?
John
P.S. If you cc me in a response, I will get it quicker.
------------------------------------------------------------------
admin 79 % rxdebug localhost -port 7001 -version
Trying 127.0.0.1 (port 7001):
AFS version: OpenAFS 1.4.11 built 2009-07-13
admin 80 % rxdebug jeremiah -port 7000 -version
Trying 129.89.143.70 (port 7000):
AFS version: OpenAFS 1.6.0pre2 built 2011-02-17 (91.pre2.sl6@fnal.gov)
admin 82 % vos examine fa11.cs351.backup
**** Could not attach volume 536875958 ****
RWrite: 536875956 Backup: 536875958
number of sites -> 1
server jeremiah.cs.uwm.edu partition /vicepa RW Site
admin 83 % vos examine fa11.cs351
fa11.cs351 536875956 RW 1679 K On-line
jeremiah.cs.uwm.edu /vicepa
RWrite 536875956 ROnly 0 Backup 536875958
MaxQuota 100000 K
Creation Thu Jun 23 20:15:28 2011
Copy Mon Aug 29 11:13:07 2011
Backup Mon Aug 29 13:43:23 2011
Last Access Mon Aug 29 13:33:06 2011
Last Update Mon Aug 29 13:28:10 2011
294 accesses in the past day (i.e., vnode references)
RWrite: 536875956 Backup: 536875958
number of sites -> 1
server jeremiah.cs.uwm.edu partition /vicepa RW Site
admin 84 % vos remove fa11.cs351.backup
vos: Missing required parameter '-id'
admin 85 % vos remove jeremiah a fa11.cs351.backup
WARNING: Volume 536875958 did not exist on the partition
Volume 536875958 on partition /vicepa server jeremiah.cs.uwm.edu deleted
admin 86 % vos examine fa11.cs351.backup
**** Could not attach volume 536875958 ****
Volume fa11.cs351.backup does not exist in VLDB
RWrite: 536875956
number of sites -> 1
server jeremiah.cs.uwm.edu partition /vicepa RW Site
admin 87 % vos zap jeremiah a fa11.cs351.backup -force -verbose
vos: forcibly removing all traces of volume 536875958, please wait...done.
admin 88 % vos examine fa11.cs351.backup
Could not fetch the information about volume 536875958 from the server
: No such device
Volume fa11.cs351.backup does not exist
Dump only information from VLDB
fa11.cs351
RWrite: 536875956
number of sites -> 1
server jeremiah.cs.uwm.edu partition /vicepa RW Site
admin 89 % vos backup fa11.cs351
Failed to clone the volume 536875956
Volume not attached, does not exist, or not on line
Error in vos backup command.
Volume not attached, does not exist, or not on line
admin 90 % vos examine fa11.cs351.backup
Could not fetch the information about volume 536875958 from the server
: No such device
Volume fa11.cs351.backup does not exist
Dump only information from VLDB
fa11.cs351
RWrite: 536875956
number of sites -> 1
server jeremiah.cs.uwm.edu partition /vicepa RW Site
admin 91 % vos backup fa11.cs351
Failed to start a transaction on the backup volume 536875958
Volume not attached, does not exist, or not on line
Error in vos backup command.
Volume not attached, does not exist, or not on line
admin 92 % vos examine fa11.cs351.backup
**** Could not attach volume 536875958 ****
RWrite: 536875956 Backup: 536875958
number of sites -> 1
server jeremiah.cs.uwm.edu partition /vicepa RW Site
admin 93 % vos create jeremiah a fa11.test
Volume 536876132 created on partition /vicepa of jeremiah
admin 94 % vos backup fa11.test
Created backup volume for fa11.test
admin 95 % vos examine fa11.test.backup
fa11.test.backup 536876134 BK 2 K On-line
jeremiah.cs.uwm.edu /vicepa
RWrite 536876132 ROnly 0 Backup 0
MaxQuota 5000 K
Creation Mon Aug 29 13:59:13 2011
Copy Mon Aug 29 13:59:13 2011
Backup Mon Aug 29 13:59:13 2011
Last Update Mon Aug 29 13:59:05 2011
0 accesses in the past day (i.e., vnode references)
RWrite: 536876132 Backup: 536876134
number of sites -> 1
server jeremiah.cs.uwm.edu partition /vicepa RW Site
[SOMEWHAT LATER, I had the bright idea to repeat my error, and to my
surprise and delight(?) the same error happens!]
admin 107 % vos move fa11.test jeremiah a jeremiah b
Volume 536876132 moved from jeremiah /vicepa to jeremiah /vicepb
admin 108 % vos move fa11.test jeremiah b jeremiah a
Volume 536876132 moved from jeremiah /vicepb to jeremiah /vicepa
admin 109 % vos backup fa11.test
Failed to start a transaction on the backup volume 536876134
Volume not attached, does not exist, or not on line
Error in vos backup command.
Volume not attached, does not exist, or not on line
admin 110 % vos examine fa11.test.backup
**** Could not attach volume 536876134 ****
RWrite: 536876132 Backup: 536876134
number of sites -> 1
server jeremiah.cs.uwm.edu partition /vicepa RW Site
---------------------------------
NB: FileLog says:
Mon Aug 29 13:57:10 2011 FSYNC_com_VolOff: failed to get heavyweight reference to volume 536875958 (state=20, flags=0x18)
Salvaleg logs (I tried to salvage fa11.cs351.backup after the series of
commands here, which said it was successful but the logs show:
$ cat SalsrvLog
08/29/2011 13:43:00 dispatching child to salvage volume 536875956...
08/29/2011 13:43:00 2 nVolumesInInodeFile 56
08/29/2011 13:43:00 CHECKING CLONED VOLUME 536875958.
08/29/2011 13:43:00 SALVAGING VOLUME 536875956.
08/29/2011 13:43:00 fa11.cs351 (536875956) updated 08/29/2011 13:28
08/29/2011 13:43:00 totalInodes 513
08/29/2011 13:43:00 Salvaged fa11.cs351 (536875956): 506 files, 1679 blocks
08/29/2011 13:56:19 dispatching child to salvage volume 536875956...
08/29/2011 13:56:19 2 nVolumesInInodeFile 56
08/29/2011 13:56:19 CHECKING CLONED VOLUME 536875958.
08/29/2011 13:56:19 SALVAGING VOLUME 536875956.
08/29/2011 13:56:19 fa11.cs351 (536875956) updated 08/29/2011 13:28
08/29/2011 13:56:19 totalInodes 513
08/29/2011 13:56:20 Salvaged fa11.cs351 (536875956): 506 files, 1679 blocks
and
$ cat SalvageLog
@(#) OpenAFS 1.6.0pre2 built 2011-02-17 (91.pre2.sl6@fnal.gov)
08/29/2011 14:01:49 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager /vicepa 536875958)
08/29/2011 14:01:49 _VLockFd: conflicting lock held on fd 4, offset 0 by pid 2177 (locktype=2)
(process 2177 is /usr/afs/bin/salvageserver)