[OpenAFS] Error when moving volumes
Jeffrey Hutzelman
jhutz@cmu.edu
Mon, 28 Jun 2004 15:29:46 -0400
On Monday, June 28, 2004 18:17:02 +0200 Frode Nilsen
<mailing-lists@cyberpunks.no> wrote:
> - Is the problem reproducible? Does it happen every time?
>
> Yes, the problem happen every time I try to move spesific volumes; I
> have about 40 uservolumes that gives the same error.
>
>
> - What versions of OpenAFS are you running on each server?
>
> marvin is running 1.2.7,
> oliven is running 1.2.11
>
>
> - How big is the volume?
>
># vos listvol marvin | grep 100554
> user.h100554 536871605 RW 7183 K On-line
>
>
> - What output do you get with -verbose ?
>
># vos move -fromserver marvin -frompartition /vicepa -toserver oliven
> -topartition /vicepa -id user.h100554 -verbose
> Starting transaction on source volume 536871605 ... done
> Cloning source volume 536871605 ... done
> Ending the transaction on the source volume 536871605 ... done
> Starting transaction on the cloned volume 536872062 ... done
> Creating the destination volume 536871605 ... done
> Dumping from clone 536872062 on source to volume 536871605 on
> destination ...Failed to move data for the volume 536871605
> VOLSER: Problems encountered in doing the dump !
> vos move: operation interrupted, cleanup in progress...
> clear transaction contexts
> access VLDB
> move incomplete - attempt cleanup of target partition - no guarantee
> cleanup complete - user verify desired result
OK. It looks like the failure is in the initial dump, not the final
incremental. The volume is only about 7MB, which is not too large in the
grand scheme of things. I cannot offhand think of a change since 1.2.7
that would break volume moves in this way, but I can't say for sure. And,
I don't recall if you told us what platform these servers are.
I wonder if your source volserver is producing volume dumps that are broken
in some fashion. I can't really debug the problem for you directly (well,
I assume you're not interested in putting a volume dump containing a copy
of all of your user's data someplace that I can see it). However, there
are a few things you might be able to do to figure out what's going on...
- Dump the volume to a file (vos dump user.h100554 0 -file /some/temp/file)
- Get and compile my dump analysis tools, which can be found in
/afs/cs.cmu.edu/project/systems-jhutz/dumpscan
- Run afsdump_scan -PHVv /some/temp/file
If the dump is normal, it should spit out a dump header, a volume
header, and then a list of all the vnodes in the volume, followed
by a "dump end" tag. This "end" tag is what is apparently missing
according to the errors we've seen so far.
Hopefully once you describe the output (really, it's probably safe to put
it somewhere on the web and send a pointer, there's not really anything
secret in it), I'll have some idea what to suggest next. In the meantime,
don't delete that dump file; we may have other ideas for analysis you can
do on it.
Oh, one other thing you should try... Once you have a dump file, try
restoring it to oliven, using a different volume name:
vos restore oliven a user.h100554.TEST /some/temp/file -verbose
It should be interesting to see if this indirect method works, and if not,
it might help us determine where the problem might be.
Hm... Yet another thing you can check, though the afsdump_scan output
should tell you this -- look at the last 5 bytes of the dump file. If the
dump is terminated correctly, they should be 04 3a 21 4b 6e (this is the
"dump end" tag and its magic number).
-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
Sr. Research Systems Programmer
School of Computer Science - Research Computing Facility
Carnegie Mellon University - Pittsburgh, PA