[OpenAFS] Error when moving volumes

Jeffrey Hutzelman jhutz@cmu.edu
Mon, 28 Jun 2004 15:29:46 -0400


On Monday, June 28, 2004 18:17:02 +0200 Frode Nilsen 
<mailing-lists@cyberpunks.no> wrote:

> - Is the problem reproducible?  Does it happen every time?
>
> Yes, the problem happen every time I try to move spesific volumes; I
> have about 40 uservolumes that gives the same error.
>
>
> - What versions of OpenAFS are you running on each server?
>
> marvin is running 1.2.7,
> oliven is running 1.2.11
>
>
> - How big is the volume?
>
># vos listvol marvin | grep 100554
> user.h100554                      536871605 RW       7183 K On-line
>
>
> - What output do you get with -verbose ?
>
># vos move -fromserver marvin -frompartition /vicepa -toserver   oliven
> -topartition /vicepa -id user.h100554 -verbose
> Starting transaction on source volume 536871605 ... done
> Cloning source volume 536871605 ... done
> Ending the transaction on the source volume 536871605 ... done
> Starting transaction on the cloned volume 536872062 ... done
> Creating the destination volume 536871605 ... done
> Dumping from clone 536872062 on source to volume 536871605 on
> destination ...Failed to move data for the volume 536871605
>    VOLSER: Problems encountered in doing the dump !
> vos move: operation interrupted, cleanup in progress...
> clear transaction contexts
> access VLDB
> move incomplete - attempt cleanup of target partition - no guarantee
> cleanup complete - user verify desired result


OK.  It looks like the failure is in the initial dump, not the final 
incremental.  The volume is only about 7MB, which is not too large in the 
grand scheme of things.  I cannot offhand think of a change since 1.2.7 
that would break volume moves in this way, but I can't say for sure.  And, 
I don't recall if you told us what platform these servers are.


I wonder if your source volserver is producing volume dumps that are broken 
in some fashion.  I can't really debug the problem for you directly (well, 
I assume you're not interested in putting a volume dump containing a copy 
of all of your user's data someplace that I can see it).  However, there 
are a few things you might be able to do to figure out what's going on...


- Dump the volume to a file (vos dump user.h100554 0 -file /some/temp/file)
- Get and compile my dump analysis tools, which can be found in
  /afs/cs.cmu.edu/project/systems-jhutz/dumpscan
- Run afsdump_scan -PHVv /some/temp/file
  If the dump is normal, it should spit out a dump header, a volume
  header, and then a list of all the vnodes in the volume, followed
  by a "dump end" tag.  This "end" tag is what is apparently missing
  according to the errors we've seen so far.

Hopefully once you describe the output (really, it's probably safe to put 
it somewhere on the web and send a pointer, there's not really anything 
secret in it), I'll have some idea what to suggest next.  In the meantime,
don't delete that dump file; we may have other ideas for analysis you can 
do on it.

Oh, one other thing you should try...  Once you have a dump file, try 
restoring it to oliven, using a different volume name:

vos restore oliven a user.h100554.TEST /some/temp/file -verbose

It should be interesting to see if this indirect method works, and if not, 
it might help us determine where the problem might be.

Hm...  Yet another thing you can check, though the afsdump_scan output 
should tell you this -- look at the last 5 bytes of the dump file.  If the 
dump is terminated correctly, they should be 04 3a 21 4b 6e (this is the 
"dump end" tag and its magic number).

-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
   Sr. Research Systems Programmer
   School of Computer Science - Research Computing Facility
   Carnegie Mellon University - Pittsburgh, PA