[OpenAFS] 1.6.0pre2 Filelog CB: WhoAreYou failed for host

Gémes Géza geza@kzsdabas.hu
Wed, 23 Mar 2011 20:29:28 +0100

2011-03-23 19:07 keltezéssel, Dale Pontius írta:
> On 03/21/2011 03:45 PM, Gémes Géza wrote:
>> 2011-03-21 15:24 keltezéssel, Jeffrey Altman írta:
>>> On 3/21/2011 2:42 AM, Gémes Géza wrote:
>>>> Hi,
>>>> I know this topic has been discussed before, but the conclusion was
>>>> that
>>>> it is caused by NAT.
>>> It is caused by firewalls, routers, network port translators and
>>> network
>>> address translators (or any other similar device) that imposes a fixed
>>> timeout on the length of time that inbound udp packets can be received
>>> in response to outbound packets.
>>> If the timeout period is less than the cache manager probe period,
>>> it is
>>> likely that this error will be seen.
>>>> This is impossible in my case, as openafs servers are firewalled from
>>>> the outside world.
>>>> The fileserver has 3 ethernet interfaces:
>>>> 1: connected to the clients, two IP addresses one active (other in the
>>>> NetRestrict file)
>>>> 2: connected to a SAN, no IP addresses
>>>> 3: connected to other cluster memebers, IP address in the
>>>> NetRestrict file
>>>> vos listaddrs gives nothing just the right IP address for the vol and
>>>> fileserver.
>>>> The FileLog is full of entries like:
>>>> CB: WhoAreYou failed for host FILESERVER
>>> I would check the firewall rules on the local machine.
>>>> Besides that all the clients (1.6.0pre2 on linux, 1.5.78 and 1.6.0pre3
>>>> on windows) are working as expected.
>>>> Except one (1.6.0pre2 on linux) which has two interfaces (one
>>>> connected
>>>> two the Fileservers network and the other in its NetRestrict file).
>>> Same here.  Check the firewall rules on that machine.
>> None of the computers in question have any firewalls (except some
>> Windows XP SP3 default firewalls, but there port 7001/UDP is open)
>> On any of the linux computers the iptables -L gives:
>> Chain INPUT (policy ACCEPT)
>> target     prot opt source               destination
>> Chain FORWARD (policy ACCEPT)
>> target     prot opt source               destination
>> Chain OUTPUT (policy ACCEPT)
>> target     prot opt source               destination
>> Besides that everything is connected at layer 2, there are no routers
>> between, the switches are HP Procurve and Openvswitch (Xen Cloud
>> Platform)
>> Cheers
>> Geza
> I dealt with this several years ago, and a friend helped me out.  My
> "networking situation" got better, and I haven't needed to do business
> this way since, at least not a work.
> Your problem isn't with the firewall proper, its with the masquerading
> (NAT) logic.  The masquerading logic has a keepalive timer for UDP
> associations - so it's a piece of state that's needed even if stateful
> firewall logic isn't in place.  I opened up a timeout and fixed the
> problem.  Now to remember where the heck that was...  I'm browsing
> around in /proc/sys/net/ipv4 and not finding anything at the moment. 
> There are 3 udp entries, but none look like a timeout.  Don't see
> anything down in conf/eth0, either.  Come to think of it, that may
> have been far enough back that I was running kernel-2.4 at the time,
> and things have changed.
> Still don't see anything, but I just want to get across the idea that
> a NAT timeout will exist even without a regular firewall, and that
> once upon a time it could be tweaked.  It probably still can, if one
> knows the magic incantation.
> Dale Pontius

Unfortunately my setup doesn't involve any kind of NAT either. All the
boxes (server and client) are on the same subnet, connected via switches.