[OpenAFS] Odd error on 'vos move'
Garance A Drosehn
Mon, 07 Dec 2015 15:46:40 -0500
I've been busy moving our AFS volumes from ancient file servers to
up-to-date file servers. So far this has been going along well,
but last week I ran into an odd error moving one 10.79 GiB file.
My main question is: Could a problem like this be caused by my
AFS token expiring in the middle of the transfer? Here's the
output from vos-move:
/usr/sbin/vos move -id <_details_> -verbose
Starting transaction on source volume <__old__> ... done
Allocating new volume id for clone of volume <__old__> ... done
Cloning source volume <__old__> ... done
Ending the transaction on the source volume <__old__> ... done
Starting transaction on the cloned volume <_clone_> ... done
Setting flags on cloned volume <_clone_> ... done
Getting status of cloned volume <_clone_> ... done
Deleting pre-existing destination volume <__old__> ...Creating the
destination volume <__old__> ... done
Setting volume flags on destination volume <__old__> ... done
Dumping from clone <_clone_> on source to volume <__old__> on
destination ...vos move: operation interrupted, cleanup in progress...
clear transaction contexts
Recovery: Releasing VLDB lock on volume <__old__> ... done
Recovery: Ending transaction on clone volume ... done
Recovery: Ending transaction on destination volume ... done
Recovery: Accessing VLDB.
FATAL: VLDB access error: abort cleanup
cleanup complete - user verify desired result
#------>Error-> *** cs=256 ***
The vos-move command took about 54 minutes. It started after I
had moved several other large volumes, and it happened that my
AFS token expired in the middle of this vos-move. I was doing
some other things in AFS at the time, and the token could not
have been expired longer than a minute or two before I noticed
it. I did a new 'klog', and it was at least five minutes later
before the vos-move terminated. I suspect it was more like
10-15 minutes, but I didn't really keep track of that.
So, could the problem have been caused by the token expiring in
the middle of the transfer?
At this point, if I do a 'listvol' on both the source and
destination servers, the volume exists on both of them. On
the destination server the volume is marked as 'Off-line'.
If I do a 'vos examine', the volume is listed as being on
the original (source) server, and is also marked as LOCKED.
I assume that the thing to do right now would be to:
1. vos-remove the copy which exists on the destination
file server (and which is not shown in vos-examine).
2. vos-unlock the copy which exists on the original
3. Retry the vos-move, this time making sure my AFS token
won't expire in the middle of the transfer!
Does this seem reasonable? Is there any other checks I should
do before trying those? I was able to read all the data in the
volume (using 'md5sum') without warnings or errors showing up
in any log files on the server.
For what it's worth: all that's in this AFS volume are log files
which have not changed since January 2015, so it isn't a crisis
if I need to do something more time-consuming to fix it. And I
could easily break this up into a dozen smaller volumes, if that
would be a prudent idea.
Garance Alistair Drosehn = email@example.com
Senior Systems Programmer or gad@FreeBSD.org
Rensselaer Polytechnic Institute; Troy, NY; USA