[OpenAFS] Vos release problems...
Rodney M Dyer
rmdyer@uncc.edu
Tue, 02 Dec 2003 15:49:43 -0500
Anyone,
We've been having issues with replicated volume releases that we haven't
been able to track down. It appears to have started once we moved to
OpenAFS (we were Transarc AFS previously), but we are not entirely
sure. We are using Sun machines running Solaris 9 and OpenAFS 1.2.8 as
file servers. The clients are a mix of Sun Solaris and Windows XP using
OpenAFS 1.2.8 or later.
Here is a typical transaction sequence...
* Change a single file in the RW volume, where...
fs1 contains the RW volume and 1 replica and is in the building I am in.
fs2 & fs3 contain the RO volumes and are in another building.
Then...
C:\>vos release -verbose coe.xpnet.system
coe.xpnet.system
RWrite: 537081842 ROnly: 537081843 Backup: 537081844
number of sites -> 4
server fs1.uncc.edu partition /vicepe RW Site
server fs2.uncc.edu partition /vicepc RO Site
server fs1.uncc.edu partition /vicepe RO Site
server fs3.uncc.edu partition /vicepf RO Site
This is a complete release of the volume 537081842
Recloning RW volume ...
Updating existing ro volume 537081843 on fs2.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs2.uncc.edu.
Could not end transaction on a ro volume: Possible communication failure
Updating existing ro volume 537081843 on fs3.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs3.uncc.edu.
Could not end transaction on a ro volume: Possible communication failure
updating VLDB ... done
Released volume coe.xpnet.system successfully
Then, without doing anything...do it again...
C:\>vos release -verbose coe.xpnet.system
coe.xpnet.system
RWrite: 537081842 ROnly: 537081843 Backup: 537081844
number of sites -> 4
server fs1.uncc.edu partition /vicepe RW Site
server fs2.uncc.edu partition /vicepc RO Site
server fs1.uncc.edu partition /vicepe RO Site
server fs3.uncc.edu partition /vicepf RO Site
This is a complete release of the volume 537081842
Recloning RW volume ...
Updating existing ro volume 537081843 on fs2.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs2.uncc.edu.
Updating existing ro volume 537081843 on fs3.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs3.uncc.edu.
updating VLDB ... done
Released volume coe.xpnet.system successfully
Now, add a new single file to the volume, then...
C:\>vos release -verbose coe.xpnet.system
coe.xpnet.system
RWrite: 537081842 ROnly: 537081843 Backup: 537081844
number of sites -> 4
server fs1.uncc.edu partition /vicepe RW Site
server fs2.uncc.edu partition /vicepc RO Site
server fs1.uncc.edu partition /vicepe RO Site
server fs3.uncc.edu partition /vicepf RO Site
This is a complete release of the volume 537081842
Recloning RW volume ...
Updating existing ro volume 537081843 on fs2.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs2.uncc.edu.
Updating existing ro volume 537081843 on fs3.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs3.uncc.edu.
Could not end transaction on a ro volume: Possible communication failure
updating VLDB ... done
Released volume coe.xpnet.system successfully
Now, delete the file that was just created, then...
C:\>vos release -verbose coe.xpnet.system
coe.xpnet.system
RWrite: 537081842 ROnly: 537081843 Backup: 537081844
number of sites -> 4
server fs1.uncc.edu partition /vicepe RW Site
server fs2.uncc.edu partition /vicepc RO Site
server fs1.uncc.edu partition /vicepe RO Site
server fs3.uncc.edu partition /vicepf RO Site
This is a complete release of the volume 537081842
Recloning RW volume ...
Updating existing ro volume 537081843 on fs2.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs2.uncc.edu.
Updating existing ro volume 537081843 on fs3.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs3.uncc.edu.
updating VLDB ... done
Released volume coe.xpnet.system successfully
C:\>vos release -verbose coe.xpnet.system
coe.xpnet.system
RWrite: 537081842 ROnly: 537081843 Backup: 537081844
number of sites -> 4
server fs1.uncc.edu partition /vicepe RW Site
server fs2.uncc.edu partition /vicepc RO Site
server fs1.uncc.edu partition /vicepe RO Site
server fs3.uncc.edu partition /vicepf RO Site
This is a complete release of the volume 537081842
Recloning RW volume ...
Updating existing ro volume 537081843 on fs2.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs2.uncc.edu.
Updating existing ro volume 537081843 on fs3.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs3.uncc.edu.
updating VLDB ... done
Released volume coe.xpnet.system successfully
C:\>
Anybody got any clues? There is no networking problem that we are aware
of. The file servers outside our building certainly have full network access.
Does the 'vos release' command on my client work by contacting each
read-only server and tell it to update it's replica?
We see our problem regularly. Our only remedy...to make sure the volume is
released, is to do the vos release again, just as in the above sequences.
Thanks for any help,
Rodney
Rodney M. Dyer
Windows Systems Programmer
Mosaic Computing Group
William States Lee College of Engineering
University of North Carolina at Charlotte
Email: rmdyer@uncc.edu
Web: http://www.coe.uncc.edu/~rmdyer
Phone (704)687-3518
Help Desk Line (704)687-3150
FAX (704)687-2352
Office 267 Smith Building