[OpenAFS] vos release problem: Problems encountered in doing the dump
Dimitris Zilaskos
dzila@tassadar.physics.auth.gr
Sun, 3 Sep 2006 13:39:04 +0300 (EEST)
Hello,
When I returned from vacation I found out that volume replication is
failing. The first problem occured on August, 8:
user.someuser
RWrite: 536870930 ROnly: 536870931
number of sites -> 2
server server1.physics.auth.gr partition /vicepa RW Site
server server2.physics.auth.gr partition /vicepa RO Site
This is a complete release of volume 536870930
Cloning RW volume 536870930 to temporary RO... done
Getting status of RW volume 536870930... done
Ending cloning transaction on RW volume 536870930... done
Starting transaction on cloned volume 536870931... done
Failed to start a transaction on the RO volume.
Possible communication failure
The volume 536870930 could not be released to the following 1 sites:
server2.physics.auth.gr /vicepa
VOLSER: release could not be completed
Error in vos release command.
VOLSER: release could not be completed
And on the next day:
user.someuser
RWrite: 536870930 ROnly: 536870931 RClone: 536870931
number of sites -> 2
server server1.physics.auth.gr partition /vicepa RW Site -- New
release
server server2.physics.auth.gr partition /vicepa RO Site -- Old
release
This is a complete release of volume 536870930
Cloning RW volume 536870930 to temporary RO... done
Getting status of RW volume 536870930... done
Ending cloning transaction on RW volume 536870930... done
Starting transaction on cloned volume 536870931... done
Updating existing ro volume 536870931 on server2.physics.auth.gr ...
Starting ForwardMulti from 536870931 to 536870931 on
server2.physics.auth.gr (as of Thu Dec 22 19:04:45 2005).
Failed to dump volume from clone to a ro site: VOLSER: Problems
encountered in reading the dump file !
The volume 536870930 could not be released to the following 1 sites:
server2.physics.auth.gr /vicepa
VOLSER: release could not be completed
Error in vos release command.
VOLSER: release could not be completed
and it goes on like that for all volumes on server1 ever since.
command used: vos release -verbose -f
Filelog on server1:
fssync: volume 536870931 restored; breaking all call backs
On server2:
fssync: volume 536870931 restored; breaking all call backs
VolserLog on server1:
Sun Sep 3 13:32:49 2006 1 Volser: ListVolumes: Volume 536870931
(V0536870931.vol) will be destroyed on next salvage
Sun Sep 3 13:32:49 2006 1 Volser: Delete: volume 536870931 deleted
Sun Sep 3 13:32:49 2006 1 Volser: Clone: Cloning volume 536870930 to new
volume 536870931
and on server2:
Sun Sep 3 13:32:49 2006 1 Volser: ReadVnodes: Restore aborted
server1 is 1.3.86. server2 is 1.4.1. Both 2.6 series linux
systems.Replication has been working without any problem for more than a year.
I have tried removing replication and readding, and the result was the
same.
Any ideas?
Best regards,
--
============================================================================
Dimitris Zilaskos
Department of Physics @ Aristotle University of Thessaloniki , Greece
PGP key : http://tassadar.physics.auth.gr/~dzila/pgp_public_key.asc
http://egnatia.ee.auth.gr/~dzila/pgp_public_key.asc
MD5sum : de2bd8f73d545f0e4caf3096894ad83f pgp_public_key.asc
============================================================================