[OpenAFS] problems with openafs 1.2.5 on Linux 2.4.18 (debian woody)

Marcus Watts mdw@umich.edu
Wed, 10 Jul 2002 08:44:33 -0400


Dr A V Le Blanc <LeBlanc@mcc.ac.uk> writes:
> From: Dr A V Le Blanc <LeBlanc@mcc.ac.uk>
> To: openafs-info@openafs.org
> Message-ID: <20020710102255.GC13945@afs.mcc.ac.uk>
> Reply-To: Dr A V Le Blanc <LeBlanc@mcc.ac.uk>
> References: <20020709235902.B929E9C40@grand.central.org>
> In-Reply-To: <20020709235902.B929E9C40@grand.central.org>
> Subject: [OpenAFS] problems with openafs 1.2.5 on Linux 2.4.18 (debian woody)
> Date: Wed, 10 Jul 2002 11:22:55 +0100
> 
> We've bought a new PC to replace an aged, ailing Silicon Graphics
> dbserver and fileserver.  I've installed debian woody on the machine,
> and the kernel is 2.4.18.  It works perfectly as an AFS client,
> so I thought I should begin by installing the fileserver.  (Packages
> are all version 1.2.5-1).  Incidentally, when bosserver starts, it
> begins with the message:
> 
>      Wed Jul 10 11:02:50 2002: Server directory access is not okay
> 
> but all the directories appear to have fairly sensible permissions.
> Anyway, I've copied over the files from /usr/afs/etc to /etc/openafs/server,
> and everything starts as normal, with the exception of this message.
> Then I create the fs process with the command
> 
>      bos create scree fs fs '/usr/lib/openafs/fileserver -nojumbo'
>           '/usr/lib/openafs/volserver -nojumbo' /usr/lib/openafs/salvager
> 
> The FileLog reads:
> 
> Wed Jul 10 11:03:11 2002 File server starting
> Wed Jul 10 11:03:11 2002 afs_krb_get_lrealm failed, using mcc.ac.gb.
> Wed Jul 10 11:03:11 2002 /var/lib/openafs/sysid: doesn't exist
> Wed Jul 10 11:03:11 2002 Creating new SysID file
> Wed Jul 10 11:03:11 2002 VL_RegisterAddrs rpc failed; The ethernet address exist on a different server; repair it
> Wed Jul 10 11:03:11 2002 VL_RegisterAddrs rpc failed; See VLLog for details
> Wed Jul 10 11:03:11 2002 Fatal error in library initialization, exiting!!
> 
> Questions:
> 
> (1)  Why does afs_krb_get_lrealm fail?
> (2)  As far as I can see, sysid should be created in /var/lib/openafs/
>      and it's not happening.  Why?
> (3)  Why does VL_RegisterAddrs fail?  There is no other machine
>      running with the IP address.
> (4)  There is no VLLog.
> 
> Suggestions would be welcome!
> 
>      -- Owen
>      LeBlanc@mcc.ac.uk

(0)
These directories must be owned by root & at least mode 755 or at most 775:
	AFSDIR_SERVER_AFS_DIRPATH
		/usr/afs	/usr/afs
	AFSDIR_SERVER_ETC_DIRPATH
		/usr/afs/etc	${prefix}/etc/openafs/server
	AFSDIR_SERVER_BIN_DIRPATH
		/usr/afs/bin	${prefix}/libexec/openafs
	AFSDIR_SERVER_LOGS_DIRPATH
		/usr/afs/logs	${prefix}/var/openafs/logs
these directories must be owned by root & at least mode 700 or at most 770.
	AFSDIR_SERVER_DB_DIRPATH
		/usr/afs/db		${prefix}/var/openafs/db
	AFSDIR_SERVER_LOCAL_DIRPATH
		/usr/afs/local	${prefix}/var/openafs/
these directories must be at least mode 700 or at most 770, but
need not be owned by root (?).
	AFSDIR_SERVER_BACKUP_DIRPATH
		/usr/afs/backup	${prefix}/var/openafs/backup
These files must be at least mode 600 and not more than 660:
	AFSDIR_SERVER_KEY_FILEPATH
		/usr/afs/etc/KeyFile	${prefix}/etc/openafs/server/KeyFile
These files must be at least mode 600 and not more than 664:
	AFSDIR_SERVER_ULIST_FILEPATH
		/usr/afs/etc/UserList	${prefix}/etc/openafs/server/UserList

So far as I can tell, reading the code, it looks like
AFSDIR_SERVER_AFS_DIRPATH is always /usr/afs, even not using transarc
paths, so I think this directory must always exist to satisfy
bosserver.  Probably an easy enough thing to fix, if anyone cares.

(1)
It should be harmless if afs_krb_get_lrealm fails.  So far as I can tell,
this only affects who is "superuser" on the file server.  To make it succeed,
create:
	AFSDIR_SERVER_ETC_DIRPATH
		/usr/afs/etc/krb.conf	${prefix}/etc/openafs/server/krb.conf
and make sure it reads something like:
	UMICH.EDU
	UMICH.EDU fear.ifs.umich.edu admin server
	UMICH.EDU surprise.ifs.umich.edu
	UMICH.EDU ruthless.ifs.umich.edu
(replacing the names & realms with your local kerberos realm & servers,
as appropriate).  Probably only the first line needs to exist;
I don't believe there's anything in AFS that is capable of doing the
equivalent of krb_get_krbhst or krb_get_admhst.  I think it's somewhat
bogus that it defaults to a lower case realm, but so be it.  The
capitalization of kerberos realm names didn't really get standardized
until after RFC 1510, ie, way after K4.  AFS cell names definitely have
to be lower-cased.

(2)
The sysid file isn't written out until the file server is
registered with VL (ie, VL_RegisterAddrs returns success.)
That's just the way it is.

(3)
VL_RegisterAddrs is indeed an RPC.  It will go out & talk to your VL
servers.  The error message you saw is returned when a VL server
returns VL_MULTIPADDR.  It means that the IP address or Uuid already
exists.  The fileserver doesn't know which is the case; that's why it's
pointing you to the DB server's VLLog.
(4)
The file VLLog should be created and exist on your database servers, by
vlserver; not on the machine you're trying to run a fileserver on
(which presumably isn't yet running vlserver.) vlserver should report
in VLLog what the problem was, each time it returns VL_MULTIPADDR .
Either it will print out a message that says "It would have replaced
the existing VLDB server entry:", which means the Uuid already existed,
or it will say "Yet another VLDB server entry exists:", which I think
means the IP address was already registered.  From the rest of your
description, the latter message is more likely.  In either case, the
two fixes appear to be to either use "vos changeaddr" to move an IP
address out of the way, or zap sysid & try again, if the uuid was in
use.  In your case since you haven't got a sysid file to zap, it sounds
like the former is your only choice.

The "uuid" is supposed to be a 16-byte globally unique random value.
It looks like they originally wanted to base this on the machine's
ethernet address, but couldn't find an easy way to get at it.  I'm not
sure why they didn't instead use the number returned by "gethostid(2)",
but they don't.
	[ hostid was always a sort of vague idea; on suns it returns
	something related to the last 4 octets of the workstation
	ethernet address; on IBM aix, it returns the IP address of the
	1st external interface.  On OpenBSD, it's deprecated, and on
	openbsd i386 machines, it returns 0 by default. ]
The format of the uuid is
	64-bit timestamp (and version), 2 bytes "clock sequence",
	6 bytes "node".  The node might be set to the IP address,
	or some other random value, followed by 0xaa, 0x77.
	{In older releases of AFS, it looks like the "random value"
	was the value of an uninitialized automatic variable...}
	I don't know why they didn't instead put "77aa" as the 1st
	2 bytes...

The sysid file contains the following:
	0	4	88aabbcc	"magic", this is a sysid file
	4	4	1		sysid file version number
	8	16	uuid, with substructures as follows:
	 8	 8	 timestamp
	 16	 2	 sequence
	 18	 4	 host IP address or core junk
	 22	 2	 0xaa,0x77
	24	4	# of IP interfaces on machine
	28	4*n	array of IP addresses

There doesn't appear to be any easy way to list the Uuids known
to a cell.  "vos listaddrs" will list out the host names for
all IP addresses known to VLserver (ie, anything that was
ever a fileserver.)  When run on umich.edu, the list includes
v-busad.c-bus1.umnet.umich.edu; this is because we once had
an MVS based fileserver at 141.211.236.2, and the IP address is
apparently used by part of some sort of networking infrastructure
at the business school today.

It would be easy enough to add a flag to "vos listaddrs" to print
out the uuid (it gets it, it just doesn't bother to print it.)
It would be nice if there were a "-n" flag to kill the IP to hostname
lookup too.

				-Marcus Watts
				UM ITCS Umich Systems Group