[OpenAFS] 1.6.0pre2 Filelog CB: WhoAreYou failed for host

Dale Pontius pontius@btv.ibm.com
Wed, 23 Mar 2011 14:07:36 -0400

On 03/21/2011 03:45 PM, G=C3=A9mes G=C3=A9za wrote:
> 2011-03-21 15:24 keltez=C3=A9ssel, Jeffrey Altman =C3=ADrta:
>> On 3/21/2011 2:42 AM, G=C3=A9mes G=C3=A9za wrote:
>>> Hi,
>>> I know this topic has been discussed before, but the conclusion was t=
>>> it is caused by NAT.
>> It is caused by firewalls, routers, network port translators and netwo=
>> address translators (or any other similar device) that imposes a fixed
>> timeout on the length of time that inbound udp packets can be received
>> in response to outbound packets.
>> If the timeout period is less than the cache manager probe period, it =
>> likely that this error will be seen.
>>> This is impossible in my case, as openafs servers are firewalled from
>>> the outside world.
>>> The fileserver has 3 ethernet interfaces:
>>> 1: connected to the clients, two IP addresses one active (other in th=
>>> NetRestrict file)
>>> 2: connected to a SAN, no IP addresses
>>> 3: connected to other cluster memebers, IP address in the NetRestrict=
>>> vos listaddrs gives nothing just the right IP address for the vol and
>>> fileserver.
>>> The FileLog is full of entries like:
>>> CB: WhoAreYou failed for host FILESERVER
>> I would check the firewall rules on the local machine.
>>> Besides that all the clients (1.6.0pre2 on linux, 1.5.78 and 1.6.0pre=
>>> on windows) are working as expected.
>>> Except one (1.6.0pre2 on linux) which has two interfaces (one connect=
>>> two the Fileservers network and the other in its NetRestrict file).
>> Same here.  Check the firewall rules on that machine.
> None of the computers in question have any firewalls (except some
> Windows XP SP3 default firewalls, but there port 7001/UDP is open)
> On any of the linux computers the iptables -L gives:
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
> Besides that everything is connected at layer 2, there are no routers
> between, the switches are HP Procurve and Openvswitch (Xen Cloud Platfo=
> Cheers
> Geza
I dealt with this several years ago, and a friend helped me out.  My=20
"networking situation" got better, and I haven't needed to do business=20
this way since, at least not a work.

Your problem isn't with the firewall proper, its with the masquerading=20
(NAT) logic.  The masquerading logic has a keepalive timer for UDP=20
associations - so it's a piece of state that's needed even if stateful=20
firewall logic isn't in place.  I opened up a timeout and fixed the=20
problem.  Now to remember where the heck that was...  I'm browsing=20
around in /proc/sys/net/ipv4 and not finding anything at the moment. =20
There are 3 udp entries, but none look like a timeout.  Don't see=20
anything down in conf/eth0, either.  Come to think of it, that may have=20
been far enough back that I was running kernel-2.4 at the time, and=20
things have changed.

Still don't see anything, but I just want to get across the idea that a=20
NAT timeout will exist even without a regular firewall, and that once=20
upon a time it could be tweaked.  It probably still can, if one knows=20
the magic incantation.

Dale Pontius

Dale Pontius
Senior Engineer
IBM Corporation
Phone: (802) 769-6850
Tie-Line: 446-6850
email: pontius@us.ibm.com

This e-mail and its attachments, if any, may contain confidential and pri=
vileged material for the sole use of the intended recipient. Any review, =
use, distribution or disclosure by others is strictly prohibited. If you =
are not the intended recipient (or authorized to receive for the recipien=
t), please contact the sender by reply e-mail and delete all copies of th=
is message from your system without copying it and notify sender of the m=
isdirection by reply e-mail.