[OpenAFS] Error when moving volumes

Frode Nilsen mailing-lists@cyberpunks.no
Mon, 28 Jun 2004 22:38:25 +0200


marvin is running RedHat 7.3, OpenAFS 1.2.7
oliven is running Fedora Core 1, OpenAFS 1.2.11


Here is the output from afsdump_scan:

# afsdump_scan -PHVv user.h100554.dump
* DUMP HEADER [0 = 0x0000000000000000]
 Magic number: 0xb3a11322
 Version:      1
 Volume ID:    536871605
 Volume name:  user.h100554
 Dump Range:   0 => 0
* VOLUME HEADER [39 = 0x0000000000000027]
 Volume ID:   536871605
 Version:     1
 Volume name: user.h100554
 In service?  true
 Blessed?     true
 Uniquifier:  10802
 Type:        0
 Parent ID:   536871605
 Clone ID:    536872065
 Max quota:   102400
 Min quota:   0
 Disk used:   7183
 File count:  1216
 Account:     0
 Owner:       1663
 Created:     Mon Aug 25 22:18:20 2003
 Accessed:    Thu Jan  1 01:00:00 1970
 Updated:     Tue Sep 23 16:37:43 2003
 Expires:     Thu Jan  1 01:00:00 1970
 Backed up:   Thu Jan  1 01:00:00 1970
 Offine Msg:  A volume utility is running.
 MOTD:
 Weekuse:              0          0          0          0
 Weekuse:              0          0          0
 Dayuse Date: Fri Apr 23 00:00:00 2004
 Daily usage: 1
* VNODE  1/1 [214 = 0x00000000000000d6]
afsdump_scan: Unknown tag in AFS volume dump Unexpected tag 'D' at 260 = 0x0000000000000104
*** FAILED: Unknown tag in AFS volume dump


Okey, that didn't look good :-(


Trying restore on oliven:

# vos restore oliven a user.h100554.TEST /root/user.h100554.dump -verbose
Restoring volume user.h100554.TEST Id 536871946 on server oliven.hib.no partition /vicepa ..Could not transmit data
Possible communication failure
Error in vos restore command.
Possible communication failure
#


I tried the same with another volume, first on oliven where it failed, then on marvin after I had removed the 
existing volume, and there it the volumedump was restored and functioning again.



On Mon, 2004-06-28 at 21:29, Jeffrey Hutzelman wrote:
> On Monday, June 28, 2004 18:17:02 +0200 Frode Nilsen 
> <mailing-lists@cyberpunks.no> wrote:
> 
> > - Is the problem reproducible?  Does it happen every time?
> >
> > Yes, the problem happen every time I try to move spesific volumes; I
> > have about 40 uservolumes that gives the same error.
> >
> >
> > - What versions of OpenAFS are you running on each server?
> >
> > marvin is running 1.2.7,
> > oliven is running 1.2.11
> >
> >
> > - How big is the volume?
> >
> ># vos listvol marvin | grep 100554
> > user.h100554                      536871605 RW       7183 K On-line
> >
> >
> > - What output do you get with -verbose ?
> >
> ># vos move -fromserver marvin -frompartition /vicepa -toserver   oliven
> > -topartition /vicepa -id user.h100554 -verbose
> > Starting transaction on source volume 536871605 ... done
> > Cloning source volume 536871605 ... done
> > Ending the transaction on the source volume 536871605 ... done
> > Starting transaction on the cloned volume 536872062 ... done
> > Creating the destination volume 536871605 ... done
> > Dumping from clone 536872062 on source to volume 536871605 on
> > destination ...Failed to move data for the volume 536871605
> >    VOLSER: Problems encountered in doing the dump !
> > vos move: operation interrupted, cleanup in progress...
> > clear transaction contexts
> > access VLDB
> > move incomplete - attempt cleanup of target partition - no guarantee
> > cleanup complete - user verify desired result
> 
> 
> OK.  It looks like the failure is in the initial dump, not the final 
> incremental.  The volume is only about 7MB, which is not too large in the 
> grand scheme of things.  I cannot offhand think of a change since 1.2.7 
> that would break volume moves in this way, but I can't say for sure.  And, 
> I don't recall if you told us what platform these servers are.
> 
> 
> I wonder if your source volserver is producing volume dumps that are broken 
> in some fashion.  I can't really debug the problem for you directly (well, 
> I assume you're not interested in putting a volume dump containing a copy 
> of all of your user's data someplace that I can see it).  However, there 
> are a few things you might be able to do to figure out what's going on...
> 
> 
> - Dump the volume to a file (vos dump user.h100554 0 -file /some/temp/file)
> - Get and compile my dump analysis tools, which can be found in
>   /afs/cs.cmu.edu/project/systems-jhutz/dumpscan
> - Run afsdump_scan -PHVv /some/temp/file
>   If the dump is normal, it should spit out a dump header, a volume
>   header, and then a list of all the vnodes in the volume, followed
>   by a "dump end" tag.  This "end" tag is what is apparently missing
>   according to the errors we've seen so far.
> 
> Hopefully once you describe the output (really, it's probably safe to put 
> it somewhere on the web and send a pointer, there's not really anything 
> secret in it), I'll have some idea what to suggest next.  In the meantime,
> don't delete that dump file; we may have other ideas for analysis you can 
> do on it.
> 
> Oh, one other thing you should try...  Once you have a dump file, try 
> restoring it to oliven, using a different volume name:
> 
> vos restore oliven a user.h100554.TEST /some/temp/file -verbose
> 
> It should be interesting to see if this indirect method works, and if not, 
> it might help us determine where the problem might be.
> 
> Hm...  Yet another thing you can check, though the afsdump_scan output 
> should tell you this -- look at the last 5 bytes of the dump file.  If the 
> dump is terminated correctly, they should be 04 3a 21 4b 6e (this is the 
> "dump end" tag and its magic number).
> 
> -- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
>    Sr. Research Systems Programmer
>    School of Computer Science - Research Computing Facility
>    Carnegie Mellon University - Pittsburgh, PA
>