[OpenAFS] Error when moving volumes
Frode Nilsen
mailing-lists@cyberpunks.no
Mon, 28 Jun 2004 22:38:25 +0200
marvin is running RedHat 7.3, OpenAFS 1.2.7
oliven is running Fedora Core 1, OpenAFS 1.2.11
Here is the output from afsdump_scan:
# afsdump_scan -PHVv user.h100554.dump
* DUMP HEADER [0 = 0x0000000000000000]
Magic number: 0xb3a11322
Version: 1
Volume ID: 536871605
Volume name: user.h100554
Dump Range: 0 => 0
* VOLUME HEADER [39 = 0x0000000000000027]
Volume ID: 536871605
Version: 1
Volume name: user.h100554
In service? true
Blessed? true
Uniquifier: 10802
Type: 0
Parent ID: 536871605
Clone ID: 536872065
Max quota: 102400
Min quota: 0
Disk used: 7183
File count: 1216
Account: 0
Owner: 1663
Created: Mon Aug 25 22:18:20 2003
Accessed: Thu Jan 1 01:00:00 1970
Updated: Tue Sep 23 16:37:43 2003
Expires: Thu Jan 1 01:00:00 1970
Backed up: Thu Jan 1 01:00:00 1970
Offine Msg: A volume utility is running.
MOTD:
Weekuse: 0 0 0 0
Weekuse: 0 0 0
Dayuse Date: Fri Apr 23 00:00:00 2004
Daily usage: 1
* VNODE 1/1 [214 = 0x00000000000000d6]
afsdump_scan: Unknown tag in AFS volume dump Unexpected tag 'D' at 260 = 0x0000000000000104
*** FAILED: Unknown tag in AFS volume dump
Okey, that didn't look good :-(
Trying restore on oliven:
# vos restore oliven a user.h100554.TEST /root/user.h100554.dump -verbose
Restoring volume user.h100554.TEST Id 536871946 on server oliven.hib.no partition /vicepa ..Could not transmit data
Possible communication failure
Error in vos restore command.
Possible communication failure
#
I tried the same with another volume, first on oliven where it failed, then on marvin after I had removed the
existing volume, and there it the volumedump was restored and functioning again.
On Mon, 2004-06-28 at 21:29, Jeffrey Hutzelman wrote:
> On Monday, June 28, 2004 18:17:02 +0200 Frode Nilsen
> <mailing-lists@cyberpunks.no> wrote:
>
> > - Is the problem reproducible? Does it happen every time?
> >
> > Yes, the problem happen every time I try to move spesific volumes; I
> > have about 40 uservolumes that gives the same error.
> >
> >
> > - What versions of OpenAFS are you running on each server?
> >
> > marvin is running 1.2.7,
> > oliven is running 1.2.11
> >
> >
> > - How big is the volume?
> >
> ># vos listvol marvin | grep 100554
> > user.h100554 536871605 RW 7183 K On-line
> >
> >
> > - What output do you get with -verbose ?
> >
> ># vos move -fromserver marvin -frompartition /vicepa -toserver oliven
> > -topartition /vicepa -id user.h100554 -verbose
> > Starting transaction on source volume 536871605 ... done
> > Cloning source volume 536871605 ... done
> > Ending the transaction on the source volume 536871605 ... done
> > Starting transaction on the cloned volume 536872062 ... done
> > Creating the destination volume 536871605 ... done
> > Dumping from clone 536872062 on source to volume 536871605 on
> > destination ...Failed to move data for the volume 536871605
> > VOLSER: Problems encountered in doing the dump !
> > vos move: operation interrupted, cleanup in progress...
> > clear transaction contexts
> > access VLDB
> > move incomplete - attempt cleanup of target partition - no guarantee
> > cleanup complete - user verify desired result
>
>
> OK. It looks like the failure is in the initial dump, not the final
> incremental. The volume is only about 7MB, which is not too large in the
> grand scheme of things. I cannot offhand think of a change since 1.2.7
> that would break volume moves in this way, but I can't say for sure. And,
> I don't recall if you told us what platform these servers are.
>
>
> I wonder if your source volserver is producing volume dumps that are broken
> in some fashion. I can't really debug the problem for you directly (well,
> I assume you're not interested in putting a volume dump containing a copy
> of all of your user's data someplace that I can see it). However, there
> are a few things you might be able to do to figure out what's going on...
>
>
> - Dump the volume to a file (vos dump user.h100554 0 -file /some/temp/file)
> - Get and compile my dump analysis tools, which can be found in
> /afs/cs.cmu.edu/project/systems-jhutz/dumpscan
> - Run afsdump_scan -PHVv /some/temp/file
> If the dump is normal, it should spit out a dump header, a volume
> header, and then a list of all the vnodes in the volume, followed
> by a "dump end" tag. This "end" tag is what is apparently missing
> according to the errors we've seen so far.
>
> Hopefully once you describe the output (really, it's probably safe to put
> it somewhere on the web and send a pointer, there's not really anything
> secret in it), I'll have some idea what to suggest next. In the meantime,
> don't delete that dump file; we may have other ideas for analysis you can
> do on it.
>
> Oh, one other thing you should try... Once you have a dump file, try
> restoring it to oliven, using a different volume name:
>
> vos restore oliven a user.h100554.TEST /some/temp/file -verbose
>
> It should be interesting to see if this indirect method works, and if not,
> it might help us determine where the problem might be.
>
> Hm... Yet another thing you can check, though the afsdump_scan output
> should tell you this -- look at the last 5 bytes of the dump file. If the
> dump is terminated correctly, they should be 04 3a 21 4b 6e (this is the
> "dump end" tag and its magic number).
>
> -- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
> Sr. Research Systems Programmer
> School of Computer Science - Research Computing Facility
> Carnegie Mellon University - Pittsburgh, PA
>