[OpenAFS] fileserver doesn't start - multihomed file and database server

Andreas Hirczy ahi@itp.tugraz.at
Wed, 15 Mar 2006 17:05:05 +0100


I have a file and database server both on the same machine - Debian GNU/Linux
sarge/stable with all current updates, Kernel 2.6.15.6 and openafs 1.4.0-1
backported from Debian etch/unstable. To reuse the IP address of a prior AFS
DB server I use "fake" - the address is inside the "official" CellServDB and I
did not bother to change it there.

Two weeks ago I noticed, the salvager ran continously, after a restart of the
server things went on ok.
Today after a reboot, bosserver restarted the salvager again. The fileserver
stopped uncleanly, salvage ran again, ...

I found the following messages in FileLog:

> Wed Mar 15 13:28:33 2006 File server starting
> Wed Mar 15 13:28:33 2006 afs_krb_get_lrealm failed, using itp.tugraz.at.
> Wed Mar 15 13:28:55 2006 VL_RegisterAddrs rpc failed; The IP address exists on a different server; repair it
> Wed Mar 15 13:28:55 2006 VL_RegisterAddrs rpc failed; See VLLog for details
> Wed Mar 15 13:28:55 2006 Fatal error in library initialization, exiting!!

VLLog does not provide something sensible. I made sure, no other machine in
our network uses the addresses, the server wants to use.

Only after I killed fake, waited for the end of the next salvage run and
killed the 2nd salvage afterwards, the fileserver started working again. Did
not try to restart "fake" again :)

The machine has been running with this configuration for about two months
now, these have been the only incedents so far.

Thanks in advance
-- 
Andreas Hirczy <ahi@itp.tugraz.at>                   http://itp.tugraz.at/~ahi/
Graz University of Technology                          phone: +43/316/873-81 90
Institute of Theoretical and Computational Physics       fax: +43/316/873-86 78
Petersgasse 16, A-8010 Graz                         mobile: +43/699/19 14 24 60