[OpenAFS-devel] volserver / replication question with older version of afs
Russ Allbery
rra@stanford.edu
Thu, 02 Feb 2006 12:56:56 -0800
Josh Fiske <jfiske@clarkson.edu> writes:
> We have a cell with three older AFS servers (1.2.11). They have been
> running great for quite some time. However, twice in the past two weeks
> the Volserver has stopped responding on one of the servers. When this
> happens, if I do a 'bos status' on the server, it tells me that
> everything is running normally. But, I know from trying to do a 'vos
> listvol' on the server, that things are not normal, because it times
> out. Both times this has happened, the server that the volserver died
> on was the sync site for the cell.
The volserver or the vlserver? I'm only confused because you mention
sync sites, and I'm used to this being a volserver problem, which doesn't
have a sync site.
If you do mean volserver, this is a 1.2.11 bug. I think it was fixed in
1.2.13; it's definitely fixed in 1.4.0.
> Also of note, we have quite a few volumes that are replicated. When the
> volserver died on the sync site, the read-only replicas were no longer
> accessible. If a read-only replica is unavailable on one server,
> shouldn't the client know to try one of the others? I thought this was
> the whole point of replication.
Clients fail over if the server is completely off-line, but don't always
fail over if the server responds to Rx pings but nothing else,
unfortunately.
--
Russ Allbery (rra@stanford.edu) <http://www.eyrie.org/~eagle/>