[OpenAFS] Vos release problems...

Rodney M Dyer rmdyer@uncc.edu
Tue, 02 Dec 2003 15:49:43 -0500


Anyone,

We've been having issues with replicated volume releases that we haven't 
been able to track down.  It appears to have started once we moved to 
OpenAFS (we were Transarc AFS previously), but we are not entirely 
sure.  We are using Sun machines running Solaris 9 and OpenAFS 1.2.8 as 
file servers.  The clients are a mix of Sun Solaris and Windows XP using 
OpenAFS 1.2.8 or later.

Here is a typical transaction sequence...

*  Change a single file in the RW volume, where...

fs1 contains the RW volume and 1 replica and is in the building I am in.
fs2 & fs3 contain the RO volumes and are in another building.

Then...

C:\>vos release -verbose coe.xpnet.system

coe.xpnet.system
     RWrite: 537081842     ROnly: 537081843     Backup: 537081844
     number of sites -> 4
        server fs1.uncc.edu partition /vicepe RW Site
        server fs2.uncc.edu partition /vicepc RO Site
        server fs1.uncc.edu partition /vicepe RO Site
        server fs3.uncc.edu partition /vicepf RO Site
This is a complete release of the volume 537081842
Recloning RW volume ...
Updating existing ro volume 537081843 on fs2.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs2.uncc.edu.
Could not end transaction on a ro volume: Possible communication failure
Updating existing ro volume 537081843 on fs3.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs3.uncc.edu.
Could not end transaction on a ro volume: Possible communication failure
updating VLDB ... done
Released volume coe.xpnet.system successfully

Then, without doing anything...do it again...

C:\>vos release -verbose coe.xpnet.system

coe.xpnet.system
     RWrite: 537081842     ROnly: 537081843     Backup: 537081844
     number of sites -> 4
        server fs1.uncc.edu partition /vicepe RW Site
        server fs2.uncc.edu partition /vicepc RO Site
        server fs1.uncc.edu partition /vicepe RO Site
        server fs3.uncc.edu partition /vicepf RO Site
This is a complete release of the volume 537081842
Recloning RW volume ...
Updating existing ro volume 537081843 on fs2.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs2.uncc.edu.
Updating existing ro volume 537081843 on fs3.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs3.uncc.edu.
updating VLDB ... done
Released volume coe.xpnet.system successfully

Now, add a new single file to the volume, then...

C:\>vos release -verbose coe.xpnet.system

coe.xpnet.system
     RWrite: 537081842     ROnly: 537081843     Backup: 537081844
     number of sites -> 4
        server fs1.uncc.edu partition /vicepe RW Site
        server fs2.uncc.edu partition /vicepc RO Site
        server fs1.uncc.edu partition /vicepe RO Site
        server fs3.uncc.edu partition /vicepf RO Site
This is a complete release of the volume 537081842
Recloning RW volume ...
Updating existing ro volume 537081843 on fs2.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs2.uncc.edu.
Updating existing ro volume 537081843 on fs3.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs3.uncc.edu.
Could not end transaction on a ro volume: Possible communication failure
updating VLDB ... done
Released volume coe.xpnet.system successfully

Now, delete the file that was just created, then...

C:\>vos release -verbose coe.xpnet.system

coe.xpnet.system
     RWrite: 537081842     ROnly: 537081843     Backup: 537081844
     number of sites -> 4
        server fs1.uncc.edu partition /vicepe RW Site
        server fs2.uncc.edu partition /vicepc RO Site
        server fs1.uncc.edu partition /vicepe RO Site
        server fs3.uncc.edu partition /vicepf RO Site
This is a complete release of the volume 537081842
Recloning RW volume ...
Updating existing ro volume 537081843 on fs2.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs2.uncc.edu.
Updating existing ro volume 537081843 on fs3.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs3.uncc.edu.
updating VLDB ... done
Released volume coe.xpnet.system successfully

C:\>vos release -verbose coe.xpnet.system

coe.xpnet.system
     RWrite: 537081842     ROnly: 537081843     Backup: 537081844
     number of sites -> 4
        server fs1.uncc.edu partition /vicepe RW Site
        server fs2.uncc.edu partition /vicepc RO Site
        server fs1.uncc.edu partition /vicepe RO Site
        server fs3.uncc.edu partition /vicepf RO Site
This is a complete release of the volume 537081842
Recloning RW volume ...
Updating existing ro volume 537081843 on fs2.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs2.uncc.edu.
Updating existing ro volume 537081843 on fs3.uncc.edu ...
Starting ForwardMulti from 537081843 to 537081843 on fs3.uncc.edu.
updating VLDB ... done
Released volume coe.xpnet.system successfully

C:\>

Anybody got any clues?  There is no networking problem that we are aware 
of.  The file servers outside our building certainly have full network access.

Does the 'vos release' command on my client work by contacting each 
read-only server and tell it to update it's replica?

We see our problem regularly.  Our only remedy...to make sure the volume is 
released, is to do the vos release again, just as in the above sequences.

Thanks for any help,

Rodney

Rodney M. Dyer
Windows Systems Programmer
Mosaic Computing Group
William States Lee College of Engineering
University of North Carolina at Charlotte
Email: rmdyer@uncc.edu
Web: http://www.coe.uncc.edu/~rmdyer
Phone (704)687-3518
Help Desk Line (704)687-3150
FAX (704)687-2352
Office  267 Smith Building