[OpenAFS] Re: Ubik trouble

Andrew Deason adeason@sinenomine.net
Mon, 13 Jan 2014 23:22:58 -0600

On Mon, 13 Jan 2014 12:32:12 -0500
Jeffrey Hutzelman <jhutz@cmu.edu> wrote:

> A worse situation arises when server A makes an RPC to server B, but the
> best route from server B back to the original source address goes via a
> different interface than the request came in on.  In this situation, the
> kernel will assign the wrong source address to server B's outgoing
> reply, which may cause Rx on server A to drop it on the floor.

But we ignore the source address when the multihoming bit is set in the
epoch. I think about everything in that post from that point on is
completely false unless I'm wrong about the below:

> It's not as noticeable for fileservers, because in the name of
> supporting multihoming, fileservers and cache managers flip a switch
> that makes Rx ignore the source address on incoming packets in certain
> cases (and depending on which version you're running).

But all processes (that use rxkad) set the multihoming bit. Unless you
are talking about something else? I don't even see where a process would
manually set or clear the multihoming bit, unless it manually set the rx
epoch, and nobody does that. The 'switch' is always flipped (or always
not flipped, I assume, if you go back far enough).

Anyone can check this easily by looking at 'rxdebug <server> 7003
-allconn' for any cell with multiple dbservers. You'll see connections
to the other dbservers over 7003 and the multihoming bit, 0x80000000, is
set for the epoch. (or 7002 for the ptserver, etc etc)

Andrew Deason