[OpenAFS] Volume access problems when DB server downed
Erwin Broschinski
broschi@id.ethz.ch
Sat, 09 Mar 2002 20:13:13 +0100 (MET)
Craig,
well, maybe I was not clear enough when posting this problem, but you are
right now.
I might just want to add: we always have a spare fileserver where I
can move all volumes to (RW and RO) if I want to evacuate a server. This I
also did in the current case, before I downed it. So there were no volumes
missing in AFS.
Do you know of any way to reduce the " Rx hard timeout" ?
Erwin
On 08-Mar-2002 Craig_Everhart@transarc.com wrote:
| Erwin,
|
| Perhaps I mis-understood your intent and actions. I thought that you
| were moving an existing Ubik server (e.g. vlserver) to continue as a
| Ubik server but with a new IP address. If you were simply
| decommissioning a Ubik server and moving its hardware to a different
| address, then what you did should be fine. The four existing Ubik
| servers should continue just fine. There is a small point in that if
| you're permanently decommissioning that old server for Ubik purposes,
| you should get around to removing the old IP address from all the
| CellServDB files so that (a) the clients don't occasionally mistakenly
| choose a dead IP address and (b) the remaining Ubik servers stop
| requiring that they have a majority of the five sites before allowing
| changes.
|
| I apologize for misinterpreting your messages. I don't know why stuff
| like "vos examine <volume>" would hang *indefinitely*, though I can well
| imagine it being hung up for two or three Rx hard timeout intervals of
| about 60 seconds each. Other accesses could well fail for the same
| reasons. Part of "vos examine" looks at the VLDB and part of it checks
| with the volserver, so if your volume had an instance on the downed
| server, you'd get timeouts trying to learn about that volume.
|
| I'm glad that all returned to service correctly once you resumed the AFS
| functionality on your old server (presumably with the old IP address as
| well).
|
| Craig