[OpenAFS] vos release problem: Problems encountered in doing the dump

Dimitris Zilaskos dzila@tassadar.physics.auth.gr
Sun, 3 Sep 2006 13:39:04 +0300 (EEST)


 	Hello,

 	When I returned from vacation I found out that volume replication is 
failing. The first problem occured on August, 8:

user.someuser
     RWrite: 536870930     ROnly: 536870931
     number of sites -> 2
        server server1.physics.auth.gr partition /vicepa RW Site
        server server2.physics.auth.gr partition /vicepa RO Site
This is a complete release of volume 536870930
Cloning RW volume 536870930 to temporary RO... done
Getting status of RW volume 536870930... done
Ending cloning transaction on RW volume 536870930... done
Starting transaction on cloned volume 536870931... done
Failed to start a transaction on the RO volume.
Possible communication failure
The volume 536870930 could not be released to the following 1 sites:
 	           server2.physics.auth.gr /vicepa
VOLSER: release could not be completed
Error in vos release command.
VOLSER: release could not be completed

And on the next day:

user.someuser
     RWrite: 536870930     ROnly: 536870931     RClone: 536870931
     number of sites -> 2
        server server1.physics.auth.gr partition /vicepa RW Site  -- New 
release
        server server2.physics.auth.gr partition /vicepa RO Site  -- Old 
release
This is a complete release of volume 536870930
Cloning RW volume 536870930 to temporary RO... done
Getting status of RW volume 536870930... done
Ending cloning transaction on RW volume 536870930... done
Starting transaction on cloned volume 536870931... done
Updating existing ro volume 536870931 on server2.physics.auth.gr ...
Starting ForwardMulti from 536870931 to 536870931 on 
server2.physics.auth.gr (as of Thu Dec 22 19:04:45 2005).
Failed to dump volume from clone to a ro site: VOLSER:  Problems 
encountered in reading the dump file !
The volume 536870930 could not be released to the following 1 sites:
 	           server2.physics.auth.gr /vicepa
VOLSER: release could not be completed
Error in vos release command.
VOLSER: release could not be completed

and it goes on like that for all volumes on server1 ever since.

command used: vos release -verbose -f

Filelog on server1:

fssync: volume 536870931 restored; breaking all call backs

On server2:
fssync: volume 536870931 restored; breaking all call backs

VolserLog on server1:
Sun Sep  3 13:32:49 2006 1 Volser: ListVolumes: Volume 536870931 
(V0536870931.vol) will be destroyed on next salvage
Sun Sep  3 13:32:49 2006 1 Volser: Delete: volume 536870931 deleted
Sun Sep  3 13:32:49 2006 1 Volser: Clone: Cloning volume 536870930 to new 
volume 536870931

and on server2:
Sun Sep  3 13:32:49 2006 1 Volser: ReadVnodes: Restore aborted

 	server1 is 1.3.86. server2 is 1.4.1. Both 2.6 series linux 
systems.Replication has been working without any problem for more than a year.
I have tried removing replication and readding, and the result was the 
same.

Any ideas?


 	Best regards,
--
============================================================================

Dimitris Zilaskos

Department of Physics @ Aristotle University of Thessaloniki , Greece
PGP key : http://tassadar.physics.auth.gr/~dzila/pgp_public_key.asc
 	  http://egnatia.ee.auth.gr/~dzila/pgp_public_key.asc
MD5sum  : de2bd8f73d545f0e4caf3096894ad83f  pgp_public_key.asc
============================================================================