[OpenAFS] Re: Weird LAN/WAN login problem

Andrew Deason adeason@sinenomine.net
Mon, 25 Feb 2013 18:22:45 -0600


On Mon, 25 Feb 2013 17:58:29 -0600
Andrew Deason <adeason@sinenomine.net> wrote:

> Your problem here is not likely to be solved by an IP mentioned in a
> configuration file; the problem is an IP that's in the VLDB, so you
> can't just change it manually.
> 
> There are some steps you can take to fix this, but just give me a
> minute to write them down.

So, what I think is likely the problem here is that conceptually you
have two 'servers' in the VLDB. One has the 'public' IP address and a
UUID (I'll call it serverA), and the other has the IP 192.168.125.5 and
no UUID (which I believe causes it to not get listed by the above
command). You can kind of see this if you run 'vos listvldb -server
<public.ip>' and 'vos listvldb -server 192.168.125.5', and you should
get different results. (Use IP addresses here, not host names, to be
sure you're running the right thing.)

Assuming all of that up there makes sense, what you need to do to fix it
is to 'move' volumes on serverB to serverA. You're not actually moving
any data around, since those are the same physical server. You're just
telling the VLDB that the volume is on a different server.

So, look at the output from 'vos listvldb -server 192.168.125.5'; this
lists every volume that's on "serverB", and so should be every volume
for which you have this problem (check that output and see if it makes
sense to you). I assume every entry in that output only lists an 'RW
site' for each volume; if you have 'RO sites', you may need to move
those separately, and I'm not covering that here. Anyway, for each RW
volume on serverB, run this:

vos changeloc <public.ip> <partition> <volume_name>

So, if you see something like this:

vol.foo
    RWrite: 3757072894
    number of sites -> 1
       server 192.168.125.5 partition /vicepa RW Site

Run:

vos changeloc <public.ip> vicepa vol.foo

And if it was on e.g. partition vicepc instead, you'd run:

vos changeloc <public.ip> vicepc vol.foo

Once you're done, you can run:

vos changeaddr 192.168.125.5 -remove

To try to remove the old "serverB". If there are any entries that exist
on serverB that you haven't moved yet, that command will fail. But if
you've moved everything, that command should succeed, and you're
guaranteed that nothing is referencing that server entry.

Okay, that should be all; make sense?


And so you're aware, and for anyone else that's about to mention them,
there are a couple of commands that do something like this
"automatically", called 'vos syncserv' and 'vos syncvldb'. I don't know
if they handle this situation correctly, though (that is, two vldb
server entries for the same server), so I wasn't using them for this.

You should also know that in general, if you manage to screw up the
VLDB, one option is always to delete the whole VLDB and recreate it. The
'vos syncserv' and 'vos syncvldb' mentioned above are able to recreate
the database from scratch, if you want to.

-- 
Andrew Deason
adeason@sinenomine.net