[OpenAFS] vos release

Russ Allbery rra@stanford.edu
Thu, 08 Aug 2002 12:43:46 -0700


Dan Pritts <danno@internet2.edu> writes:

> I've used AFS with file servers in different sites, hundreds of miles
> away.

> Normally everything went just groovy, but occasionally when the WAN
> links flaked (thanks NYNEX) we would see terrible performance problems
> when doing a vos relase.  

> If you didn't know what was happening (say, because the vos release
> was part of your useradd script, and it just seemingly hung there
> forever), or just because you were impatient,  and you hit control-c,
> the vos release process would die and the volume would be locked, and
> the replica at the remote site would be hosed.

Just as a data point, it's not clear to me that this always has something
to do with network problems.  We've seen exactly the same behavior on the
campus network with no noticable network difficulties between the servers.
Every so often the volume release would just not work; usually it would
involve "possible communication failure" errors and usually errors about
being unable to start a transaction.  It seemed to be strongly correlated
with some other volserver operation happening on that system at the same
time (in other words, I could make it happen about 70% of the time by
releasing two volumes located on the same servers at the same time).

This has been getting slowly worse all summer, but we think it was due to
having a mixed OpenAFS and Transarc AFS set of file servers.  Or at least
we hope.  As of this morning, everything is running OpenAFS 1.2.6, so
we'll see if it gets any better....

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>