[OpenAFS-devel] Sysid probmens when upgrading FreeBSD server to 1.3.80

Harald Barth haba@pdc.kth.se
Sat, 26 Mar 2005 13:10:52 +0100 (MET)


> > Nope, "ethernet" is wrong. Maybe "IP" or "network".
> > And could we please see the offending evidence?
> 
> The problem is for a large cell we don't want to send that much
> information to a fileserver.
> The full debug trace needs to be computed on the vlserver.

I don't understand the "that much information" problem. Maybe we are
talking about different things? I think we need to write in the log
what the fileserver tried to register and if it fails ask the vlserver
what it thinks about the offending UUID and IP addrs. Then the
sysadmin at that fileserver has a chance to figure out what is wrong.

By the way, a lot of output from for example vos listvldb dealing with
volumes and their locations is wrong, if the volumes are on an UUID
and not on an IP addr the output from vos should say so.

> (...) I suppose it's
> also possible older versions of vlserver closed stdout on startup, all
> I know is recent versions of vlserver do have VLLog open on fd=1. 

Unfortunately my vlservers are spartan. No lsof for example. However,
my (eh Stacken's :-) vlservers are version 1.2.11.

> Usage of printf in the vlserver code is pretty specific to
> SVL_RegisterAddrs -- everything else uses the VLog macro.  Perhaps
> these calls should be switched over to VLog for consistency's sake?

Looking at the source: Oh no, this mix of VLog() and printf() is so
broken. No wonder I did not see a thing in the log.

> Does vos lista show any 127.0.0.0/8 addresses?  We might be looking at
> a case where rx_getAllAddr isn't working properly on fbsd.  All the
> vlserver needs to find are two srvidx's with ip's that match ones from
> your bulkaddrs vector, and the game is over due to ambiguity.

Nope. (vos listaddrs -printuuid -cell stacken.kth.se -noresolve)
But it might have tried to register one. I don't know.

BTW, is there any way to say to the vlserver that it should zap
everything it knows about an UUID because it has not been used
for decades?

> > +                   ("VL_RegisterAddrs rpc failed: The IP address(es) conflicted with the registered UUID\n"));
> 
> This is definitely an improvement over the old error, but it only
> describes one of the failure modes.  The two failure modes I'm aware
> of are:
> (1) the UUID is registered, but at least one address in FS_HostAddrs
> is registered to another server
> (2) the UUID is not registered, and the addrs in FS_HostAddrs are
> registered to at least two servers

I'm don't understand how you get to situation (2), but this UUID stuff
is obscure. So what should be printed in case (2) happens? Should
the fileserver do something like "vos listaddrs" to make things clear?

> >             ViceLog(0,
> > -                   ("VL_RegisterAddrs rpc failed; See VLLog for details\n"));
> 
> since there is extensive logging in SVL_RegisterAddrs, i'd prefer to
> see this line remain.

OK, we just have to find it ;-)

> > +                   ("UUID: %s\n",uuid));
> > +           for (n = 0; n < FS_HostAddr_cnt; n++) {
> > +               Vicelog(0,
> > +                       ("IP %d: %d.%d.%d.%d\n", n+1,
> > +                        (addr) & 0xff,
> > +                        (addr >> 8) & 0xff,
> > +                        (addr >> 16) & 0xff,
> > +                        (addr >> 24) & 0xff));
> 
> I think the previous four lines should be replaced by the following:
> 
> (FS_HostAddrs_HBO[n] >>24) & 0xff,
> (FS_HostAddrs_HBO[n] >>16) & 0xff,
> (FS_HostAddrs_HBO[n] >>8) & 0xff,
> (FS_HostAddrs_HBO[n]) & 0xff));

Yes of course. Resulting patch below.

Harald.

--- src/viced/viced.c.~1.59.~   2004-09-08 23:35:54.000000000 +0200
+++ src/viced/viced.c   2005-03-26 13:04:58.000000000 +0100
@@ -1462,9 +1462,23 @@
     code = ubik_Call(VL_RegisterAddrs, cstruct, 0, &FS_HostUUID, 0, &addrs);
     if (code) {
        if (code == VL_MULTIPADDR) {
+           char uuid[1024];
+           int n;
+
+           afsUUID_to_string(FS_HostUUID, uuid, 1024);
            ViceLog(0,
-                   ("VL_RegisterAddrs rpc failed; The ethernet address exist on a different server; repair it\n"));
+                   ("VL_RegisterAddrs rpc failed; The IP address(es) conflicted when tyring to register the fileserver UUID\n"));
            ViceLog(0,
+                   ("UUID: %s\n",uuid));
+           for (n = 0; n < FS_HostAddr_cnt; n++) {
+               Vicelog(0,
+                       ("IP %d: %d.%d.%d.%d\n", n+1, 
+                        (FS_HostAddrs_HBO[n] >>24) & 0xff,
+                        (FS_HostAddrs_HBO[n] >>16) & 0xff,
+                        (FS_HostAddrs_HBO[n] >>8) & 0xff,
+                        (FS_HostAddrs_HBO[n]) & 0xff));
+           }
+            ViceLog(0,
                    ("VL_RegisterAddrs rpc failed; See VLLog for details\n"));
            return code;
        } else if (code == RXGEN_OPCODE) {