[OpenAFS] Multihomed issues
Tue, 18 Jan 2011 01:56:55 +0100
Quoting Russ Allbery <email@example.com>:
> The file server is what tells the VLDB that it has those addresses, so I
> think the same solution should work there. The trick is that you have to
> create the NetInfo and NetRestrict files before the first time you start
> the file server. It should then not register them.
I had created a NetInfo file, but not a NetRestrict. I had done this
after installing the AFS server packages, but before running the file
server for the first time. Then I created the ptserver and vlserver
processes on the new server. Guess that's not enough.
> The same NetInfo and NetRestrict files that work for file servers also
> work for the vlserver and ptserver, although in those cases only Ubik
> should care; everything else uses either AFSDB/SRV DNS information or the
> addresses in CellServDB.
Looks like you're right about that.
>> In my case, this includes several private addresses that I don't want
>> any of the database servers to use. However, even if immediately
>> afterwards I remove these addresses from a new server's CellServDB and
>> restart it, it's too late: they're already in the VLDB and AFS is
>> already trying to send a new RO copy of root.cell to the new
>> server... using that server's private range IP address.
> I don't think you meant CellServDB here, or if you did, then something
> else is going on that I don't understand. CellServDB shouldn't ever
> contain the IP addresses of file servers (unless your VLDB servers are
> also file servers).
I did mean /etc/openafs/server/CellServDB, but, like the original
server, this new (second) server is both a file server and a VLDB
> vos setaddr, you mean? vos delentry is what will fix the above; you need
> to delentry the replication site on 192.168.26.10, ...
No, vos delentry. But, I'm happy to report that it did do the trick
after I ran it again on the first server:
~# vos delentry -server 192.168.26.10
Deleting VLDB entries for server 192.168.26.10
Total VLDB entries deleted: 1; failed to delete: 0
This also had one further, unexpected result:
~# vos examine root.cell -noresolve
VLDB: no such entry
Oops! At this point, the AFS filesystem was unavailable. However, a
vos listvol showed that it was still there:
~# vos listvol localhost |grep root.cell
root.cell 536870915 RW 6 K On-line
root.cell.readonly 536870916 RO 6 K On-line
So, I ran "vos syncvldb localhost a" and "vos syncserv localhost a"
and got it back. After that I could access the directory tree again.
I might add that vos syncvldb also gave a list of warnings about
orphaned volumes. After following that up with vos syncserv, I found
that I could run "vos delentry -server 192.168.24.10" (the internal
interface of the first AFS server) and it would report 14 entries
deleted. Repeating vos delentry would result in zero deletions, but 14
could be deleted again after repeating the previous vos syncvldb and
vos syncserv commands. For a while I wondered what to do about this,
but when I checked it again later, these problems seemed to have
disappeared on their own. Now everything looks to be just fine.