[OpenAFS-devel] Sysid probmens when upgrading FreeBSD server to 1.3.80

Tom Keiser Tom Keiser <tkeiser@gmail.com>
Sat, 26 Mar 2005 13:19:54 -0500


On Sat, 26 Mar 2005 13:10:52 +0100 (MET), Harald Barth <haba@pdc.kth.se> wrote:
> 
> > > Nope, "ethernet" is wrong. Maybe "IP" or "network".
> > > And could we please see the offending evidence?
> >
> > The problem is for a large cell we don't want to send that much
> > information to a fileserver.
> > The full debug trace needs to be computed on the vlserver.
> 
> I don't understand the "that much information" problem. Maybe we are
> talking about different things? I think we need to write in the log
> what the fileserver tried to register and if it fails ask the vlserver
> what it thinks about the offending UUID and IP addrs. Then the
> sysadmin at that fileserver has a chance to figure out what is wrong.
> 

It's a trade-off.  My concern is if we end up with the fileserver in a
tight restart loop, now it's doing up to 16 additional GetAddrsU rpc's
on startup to handle debugging that is also supposed to happen inside
vlserver (if it weren't for the printf mess).

> By the way, a lot of output from for example vos listvldb dealing with
> volumes and their locations is wrong, if the volumes are on an UUID
> and not on an IP addr the output from vos should say so.
> 

Unfortunately, vos has no knowledge of what's going on because there's
no bulk interface that uses uvldbentry's.

> > (...) I suppose it's
> > also possible older versions of vlserver closed stdout on startup, all
> > I know is recent versions of vlserver do have VLLog open on fd=1.
> 
> Unfortunately my vlservers are spartan. No lsof for example. However,
> my (eh Stacken's :-) vlservers are version 1.2.11.
> 
> > Usage of printf in the vlserver code is pretty specific to
> > SVL_RegisterAddrs -- everything else uses the VLog macro.  Perhaps
> > these calls should be switched over to VLog for consistency's sake?
> 
> Looking at the source: Oh no, this mix of VLog() and printf() is so
> broken. No wonder I did not see a thing in the log.
> 

Yeah, that is definitely in need of patching.

> > Does vos lista show any 127.0.0.0/8 addresses?  We might be looking at
> > a case where rx_getAllAddr isn't working properly on fbsd.  All the
> > vlserver needs to find are two srvidx's with ip's that match ones from
> > your bulkaddrs vector, and the game is over due to ambiguity.
> 
> Nope. (vos listaddrs -printuuid -cell stacken.kth.se -noresolve)
> But it might have tried to register one. I don't know.
> 
> BTW, is there any way to say to the vlserver that it should zap
> everything it knows about an UUID because it has not been used
> for decades?

Only one way I know of:

vos lista -noresolve -printuuid
pick an ip from the offending uuid
vos changea <ip> -remove

There should really be a new ChangeAddrU rpc.  As it stands, your only
options are to overwrite an mh record with a single ip address, or
entirely remove the mh record.  We should really provide something
more finely granular.  Maybe one of these days I'll get around to
writing something...

> 
> > > +                   ("VL_RegisterAddrs rpc failed: The IP address(es) conflicted with the registered UUID\n"));
> >
> > This is definitely an improvement over the old error, but it only
> > describes one of the failure modes.  The two failure modes I'm aware
> > of are:
> > (1) the UUID is registered, but at least one address in FS_HostAddrs
> > is registered to another server
> > (2) the UUID is not registered, and the addrs in FS_HostAddrs are
> > registered to at least two servers
> 
> I'm don't understand how you get to situation (2), but this UUID stuff
> is obscure. So what should be printed in case (2) happens? Should
> the fileserver do something like "vos listaddrs" to make things clear?
> 

(2) can happen when you bring up a new multihomed fileserver, and some
of its IPs used to belong (and are still registered) to several other
servers.

Something similar to vos listaddrs would work.  A GetAddrsU call on
FS_HostUUID, and one for each addr in FS_HostAddrs should provide all
the information we need.

-- 
Tom