[OpenAFS] Volume access problems when DB server downed

Craig_Everhart@transarc.com Craig_Everhart@transarc.com
Thu, 7 Mar 2002 14:38:05 -0500 (EST)


Excerpts from mail: 7-Mar-02 Re: [OpenAFS] Volume access.. Erwin
Broschinski@id.eth (564*)

> No, I left the other 4 Servers untouched - apart from a weekly restart of bos.

You didn't change the IP address in the other servers' CellServDB files?
 How did you expect Ubik, which is a server-to-server protocol, to keep
working?

> What happens, if any client or server directs a request to a server in its
> CellServDB and finds that one dead? I assumed, it would try the next in the
> list. But this was not observed in all cases.

There are two issues.

The client will indeed try the next one in its list, but there are
things that only the sync site can do, and the different servers direct
the client to contact the sync site in those cases, regardless of
whether the sync site is in the client's CellServDB.  I'm not sure what
the client will do when it is told to contact a server (the sync site)
that isn't in its CellServDB, but you should avoid letting this happen.

The other issue is that the Ubik servers know about each other by virtue
of all of them being listed in the CellServDB, and that a Ubik server
reads the CellServDB at startup and not subsequently.  Given that Ubik
servers talk to each other to communicate database versions and carry
out quorum elections and generally run the DB maintenance protocols,
each server needs to know about all of the other Ubik sites.  Thus it's
particularly important not only to update the CellServDB on all DB
servers when you change an IP address for one of them, but also to
restart all the Ubik servers (vlserver, ptserver, kaserver if you use
it, the backup server) after changing the CellServDB.

There may be some measures in Ubik to try to accommodate administrators
that don't do this, but they are not likely to be completely successful.

		Craig