[OpenAFS-devel] How to create inconsistency in the volserver and my mind.

Harald Barth haba@pdc.kth.se
Tue, 15 Mar 2005 12:20:14 +0100 (MET)


Hi everybody,
I think this behaviour needs improvement....

Start with existing volume foo:

create volume example a foo 4711

Do a restore 

command-that-fails | vos restore example a foo -overwrite full

Now volume foo is gone on server but still exists in vldb. Why does
a vos restore first delete and then create the volume? Would it
not be better to first create a clone and then when we have a good
clone remove the original?

Another smaller problem are the error messages that seem to be bogous:

Tue Mar 15 11:05:19 2005 VCreateVolume: Header file /vicepb/V0537057012.vol already exists!
Tue Mar 15 11:05:19 2005 1 Volser: CreateVolume: Unable to create the volume; aborted, error code 104
Tue Mar 15 11:05:19 2005 : Connection reset by peer

This is not an error, I wanted an overwrite, so of course the volume
exists. And what exactly is aborted? And what is 104?

Tue Mar 15 11:05:19 2005 1 Volser: Delete: volume 537057012 deleted 
Tue Mar 15 11:05:19 2005 1 Volser: CreateVolume: volume 537057012 (dah.test.flopp) created
Tue Mar 15 11:05:19 2005 1 Volser: RestoreVolume: Error reading header file for dump; aborted

And this is the log from the broken -overwrite full which results in
the vl-volser inconsistency.

But it is not less confusing when the overwrite actually succeeds:

# /opt/afsbackup/bin/adsmpipe -s /scratch -A -x -f dah.test.houting.a.backup.00000000.20050210 | vos restore houting a dah.test.flapp -overwrite full -verbose -local 
Volume exists; Will delete and perform full restore
Restoring volume dah.test.flapp Id 537057015 on server houting.pdc.kth.se partition /vicepa ..Deleting the previous volume 537057015 ... done
 done
Updating the existing VLDB entry
------- Old entry -------

dah.test.flapp 
    RWrite: 537057015 
    number of sites -> 1
       server houting.pdc.kth.se partition /vicepa RW Site 
------- New entry -------

Failed to get info about server's -2098337598 address(es) from vlserver (err=0)
   
dah.test.flapp 
    RWrite: 537057015 
    number of sites -> 1
       server houting.pdc.kth.se partition /vicepa RW Site 
Restored volume dah.test.flapp on houting /vicepa
bash-2.05b# /opt/afsbackup/bin/adsmpipe -s /scratch -A -x -f dah.test.houting.a.backup.20050210.20050305 | vos restore houting a dah.test.flapp -overwrite incremental -verbose -local 
Restoring volume dah.test.flapp Id 537057015 on server houting.pdc.kth.se partition /vicepa .. done
Restored volume dah.test.flapp on houting /vicepa

-2098337598??? What I can see there is nothing wrong with the addrs of that server:

# vos listaddr -printuuid
....
UUID: 00787cce-7ea0-1214-8a-26-c2e8ed82aa77
houting.pdc.kth.se
houting-le.pdc.kth.se

Then in other news I have a sudden volserver restart on another server

cysteine# tail -1 BosLog
Mon Mar 14 18:15:08 2005: fs:vol exited on signal 6

and there is nothing in the Volserlog but the vos backup that
according to the script that run the vos backup completed
without error. No core either.

cysteine# tail -1 VolserLog.old 
Mon Mar 14 18:15:08 2005 1 Volser: Clone: Recloning volume 537039787 to volume 537039789

Suggestions and leads welcome,
Harald.