[OpenAFS] adding a database server to a cell

forrest whitcher fw@fwsystems.com
Thu, 11 Oct 2001 12:29:48 -0400


I'm stumped, I've checked hostnames ... at first go I had tried
making this work on hosts that happened to be multi-homed - the
FAQ says that as of AFS 3.4 database servers may not be multi-homed
is this still true of openafs?

Anyhow, it still fails on working from a single-interface 
system, here's the order, taken from the AFS quick-beginnings
doc file:

the operating AFS database server is a.fwsystems.com

made copies of /usr/afs/etc/* on the new server (b)
started bosserver


bos create b.fwsystems.com upclientetc simple "/usr/afs/bin/upclient a.fwsystems.com /usr/afs/etc" -cell athena.fwsystems.com

bos addhost a b.fwsystems.com

verified that the new server was receiving updatated etc files

bos listh b     
Cell name is athena.fwsystems.com
    Host 1 is a.fwsystems.com
    Host 2 is b.fwsystems.com


bos create b.fwsystems.com buserver simple /usr/afs/bin/buserver
gives: 
/usr/afs/bin/buserver: problems with host name Ubik init failed
/usr/afs/bin/buserver: problems with host name Ubik init failed
/usr/afs/bin/buserver: problems with host name Ubik init failed
/usr/afs/bin/buserver: problems with host name Ubik init failed
.... 13 total messages

On the console of the primary database server (a):

x.y.z.13 

x.y.z.13 

x.y.z.13 

x.y.z.13 
.... 13 total messages

continue on b:

bos create b.fwsystems.com ptserver simple /usr/afs/bin/ptserver
bos create b.fwsystems.com vlserver simple /usr/afs/bin/vlserver

(I'm not running kaserver, all authentications are via aklog to
the K5 kdc ... if kaserver were running, that would be the 
first step here ... is this perhaps the problem????)l

on a, and b:

do 'bos restart host' 

And the 'correct' protection and volume location servers fail on the primary

bos status a
Instance buserver, has core file, currently running normally.
Instance ptserver, temporarily disabled, stopped for too many errors, currently shutdown.
Instance vlserver, temporarily disabled, stopped for too many errors, currently shutdown.
Instance fs, has core file, currently running normally.
    Auxiliary status is: file server running.
Instance upserver, has core file, currently running normally.

The relavent logs

PtLog:

ptserver: problems with host name Ubik init failed
mary address
Thu Oct 11 11:42:42 2001 Inconsistent Cell Info from server: Thu Oct 11 11:42:42 2001 Local CellServDB:
x.y.z.2 
Server 1: x.y.z.13 
Thu Oct 11 11:42:42 2001 Inconsistent Cell Info on server: x.y.z.13 

and VLLog:

Thu Oct 11 11:42:43 2001 Using x.y.z.2 as my primary address
Thu Oct 11 11:42:43 2001 Inconsistent Cell Info from server: Thu Oct 11 11:42:43 2001 Local CellServDB:
x.y.z.2 
Server 1: x.y.z.13 
Thu Oct 11 11:42:43 2001 Inconsistent Cell Info on server: x.y.z.13 
vlserver: Ubik init failed with code 5385


So the new server is labelling itself 'Server 1:' (Pt&VLLogs above)  ?????

I'm sure I missed something in an early step, Looking at the one populated
database on the new server (bdb.DB0), it has learned the names of all of the 
volume sets on the two fileservers. - so it's seems to be getting the 
backup database ok, the prdb* and vldb* are essentially empty.

Also no /usr/afs/local/sysid file has been created?

pointers on what I've overlooked?

thanks,           forrest