[OpenAFS] Multihomed issues

Jaap Winius jwinius@umrk.nl
Tue, 18 Jan 2011 01:56:55 +0100

Quoting Russ Allbery <rra@stanford.edu>:

> The file server is what tells the VLDB that it has those addresses, so I
> think the same solution should work there.  The trick is that you have to
> create the NetInfo and NetRestrict files before the first time you start
> the file server.  It should then not register them.

I had created a NetInfo file, but not a NetRestrict. I had done this  
after installing the AFS server packages, but before running the file  
server for the first time. Then I created the ptserver and vlserver  
processes on the new server. Guess that's not enough.

> The same NetInfo and NetRestrict files that work for file servers also
> work for the vlserver and ptserver, although in those cases only Ubik
> should care; everything else uses either AFSDB/SRV DNS information or the
> addresses in CellServDB.

Looks like you're right about that.

>> In my case, this includes several private addresses that I don't want
>> any of the database servers to use. However, even if immediately
>> afterwards I remove these addresses from a new server's CellServDB and
>> restart it, it's too late: they're already in the VLDB and AFS is
>> already trying to send a new RO copy of root.cell to the new
>> server... using that server's private range IP address.
> I don't think you meant CellServDB here, or if you did, then something
> else is going on that I don't understand.  CellServDB shouldn't ever
> contain the IP addresses of file servers (unless your VLDB servers are
> also file servers).

I did mean /etc/openafs/server/CellServDB, but, like the original  
server, this new (second) server is both a file server and a VLDB  

> vos setaddr, you mean?  vos delentry is what will fix the above; you need
> to delentry the replication site on, ...

No, vos delentry. But, I'm happy to report that it did do the trick  
after I ran it again on the first server:

    ~# vos delentry -server
    Deleting VLDB entries for server
    Total VLDB entries deleted: 1; failed to delete: 0
    ~# _

This also had one further, unexpected result:

    ~# vos examine root.cell -noresolve
    VLDB: no such entry
    ~# _

Oops! At this point, the AFS filesystem was unavailable. However, a  
vos listvol showed that it was still there:

    ~# vos listvol localhost |grep root.cell
    root.cell                         536870915 RW          6 K On-line
    root.cell.readonly                536870916 RO          6 K On-line

So, I ran "vos syncvldb localhost a" and "vos syncserv localhost a"  
and got it back. After that I could access the directory tree again.

I might add that vos syncvldb also gave a list of warnings about  
orphaned volumes. After following that up with vos syncserv, I found  
that I could run "vos delentry -server" (the internal  
interface of the first AFS server) and it would report 14 entries  
deleted. Repeating vos delentry would result in zero deletions, but 14  
could be deleted again after repeating the previous vos syncvldb and  
vos syncserv commands. For a while I wondered what to do about this,  
but when I checked it again later, these problems seemed to have  
disappeared on their own. Now everything looks to be just fine.