[OpenAFS] vos release

Russ Allbery rra@stanford.edu
Thu, 08 Aug 2002 14:22:15 -0700


Derrick J Brashear <shadow@dementia.org> writes:
> On Thu, 8 Aug 2002, Russ Allbery wrote:

>> Just as a data point, it's not clear to me that this always has
>> something to do with network problems.  We've seen exactly the same
>> behavior on the campus network with no noticable network difficulties
>> between the servers.  Every so often the volume release would just not
>> work; usually it would involve "possible communication failure" errors
>> and usually errors about being unable to start a transaction.  It
>> seemed to be strongly correlated

> I think a fix in OpenAFS 1.2.6 will help this. Particularly, Brent
> Johnson mentioned something to me at Usenix and based on that we made a
> change in the fssync interface. I'm told IBM made an analogous change
> sometime recently also.

So far, I seem to be having fewer problems, but they're not gone.  With
the first volume release I did today, I got the same problem that I'd
gotten before:

(root) windlord:~> alias rfv
vos release -f -v
(root) windlord:~> rfv pubsw.siteemacs

pubsw.siteemacs 
    RWrite: 2003896810    ROnly: 2003896811    Backup: 2003896812
    number of sites -> 4
       server afssvr22.Stanford.EDU partition /vicepj RW Site 
       server afssvr22.Stanford.EDU partition /vicepj RO Site 
       server afssvr23.Stanford.EDU partition /vicepm RO Site 
       server afssvr11.Stanford.EDU partition /vicepd RO Site 
This is a complete release of the volume 2003896810
Recloning RW volume ...
Failed to end cloning transaction on RW 2003896811
Possible communication failure
Error in vos release command.
Possible communication failure
(root) windlord:~> rfv pubsw.siteemacs

pubsw.siteemacs 
    RWrite: 2003896810    ROnly: 2003896811    Backup: 2003896812
    number of sites -> 4
       server afssvr22.Stanford.EDU partition /vicepj RW Site 
       server afssvr22.Stanford.EDU partition /vicepj RO Site 
       server afssvr23.Stanford.EDU partition /vicepm RO Site 
       server afssvr11.Stanford.EDU partition /vicepd RO Site 
This is a complete release of the volume 2003896810
Recloning RW volume ...
Updating existing ro volume 2003896811 on afssvr23.Stanford.EDU ...
Starting ForwardMulti from 2003896811 to 2003896811 on afssvr23.Stanford.EDU (full release).
Updating existing ro volume 2003896811 on afssvr11.Stanford.EDU ...
Starting ForwardMulti from 2003896811 to 2003896811 on afssvr11.Stanford.EDU (full release).
updating VLDB ... done
Released volume pubsw.siteemacs successfully

Perhaps the -f flag at this point has something to do with it?  We
standardized on always using it at some point in the past when Transarc
AFS would corrupt the volume unless -f was given, and then never changed
back.

In a bunch of subsequent volume releases, I haven't had any trouble except
for an occasional:

Could not end transaction on a ro volume: Possible communication failure

right before the volume release finishes that doesn't seem to have
interfered with the success of the release.

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>