[OpenAFS] Volume access problems when DB server downed

Craig_Everhart@transarc.com Craig_Everhart@transarc.com
Fri, 8 Mar 2002 13:27:26 -0500 (EST)


Erwin,

Perhaps I mis-understood your intent and actions.  I thought that you
were moving an existing Ubik server (e.g. vlserver) to continue as a
Ubik server but with a new IP address.  If you were simply
decommissioning a Ubik server and moving its hardware to a different
address, then what you did should be fine.  The four existing Ubik
servers should continue just fine.  There is a small point in that if
you're permanently decommissioning that old server for Ubik purposes,
you should get around to removing the old IP address from all the
CellServDB files so that (a) the clients don't occasionally mistakenly
choose a dead IP address and (b) the remaining Ubik servers stop
requiring that they have a majority of the five sites before allowing
changes.

I apologize for misinterpreting your messages.  I don't know why stuff
like "vos examine <volume>" would hang *indefinitely*, though I can well
imagine it being hung up for two or three Rx hard timeout intervals of
about 60 seconds each.  Other accesses could well fail for the same
reasons.  Part of "vos examine" looks at the VLDB and part of it checks
with the volserver, so if your volume had an instance on the downed
server, you'd get timeouts trying to learn about that volume.

I'm glad that all returned to service correctly once you resumed the AFS
functionality on your old server (presumably with the old IP address as
well).

		Craig