[OpenAFS] Repeated lost contact with servers - only at one location

Stephan Wiesand stephan.wiesand@desy.de
Tue, 15 Dec 2015 19:10:53 +0100


On Dec 15, 2015, at 14:15 , Jan Iven wrote:

> On 12/15/2015 11:53 AM, Orel Gueta wrote:
>> Hi Ben,
>>=20
>> Thanks for the tip, however if I do "fs checkservers -cell cern.ch
>> <http://cern.ch>", I get the same result. Perhaps because
>> /etc/openafs/ThisCell is set to CERN.CH <http://CERN.CH>?
>=20
> yes.
>=20
>> Either way, regardless if I specify the cell or not, I see a few =
servers
>> down, at cern.ch <http://cern.ch> and in other places.
>=20
> I would suspect some network thing. Perhaps some stateful firewall is =
timing out too fast for slow servers (or a high-latency network) to =
reply in time.

Possibly, but I think Ubuntu still defaults to iptables disabled. It =
could be a NAT issue too, but that should break ping as well. Maybe it's =
something with the MTU or fragmented UDP packets. The "-rxmaxfrags" and =
"-rxmaxmtu" arguments to afsd may be worthwile playing with.

I've also seen servers on a "friendly" site blacklisting my clients in =
the past just due to an "ls -R" and some latency... but that was a while =
ago.

Regards,
	Stephan

> You might want to use "rxdebug afs263.cern.ch 7000 -version" as a =
simple ping-like test (which nevertheless uses the AFS protocol, unlike =
real "ping). If that also fails, you have simplified the test case.
> You could then use "wireshark" to get a network-level packet trace =
(you would expect to note missing packets, i.e client repeatedly sending =
something to UDP/7000 but not getting an answer).
> And perhaps see whether your new Ubuntu comes with a newer firewall, =
and try to configure that in "logging" mode, then check whether it =
happily ditches those reply packets from the server..
>=20
> By the way, the packet-eating device might also be your local home =
router. Perhaps old Ubuntu configured it to open some ports via UPNP, =
and the new release no longer does this.
>=20
> Cheers
> jan
>=20
>> Orel
>>=20
>> On 14 December 2015 at 23:38, Benjamin Kaduk <kaduk@mit.edu
>> <mailto:kaduk@mit.edu>> wrote:
>>=20
>>    On Mon, 14 Dec 2015, Orel Gueta wrote:
>>=20
>>    > - fs checkservers reports a few servers down (likeafs263.cern.ch =
<http://afs263.cern.ch>), but I
>>    > can ping them.
>>=20
>>    A quick note -- fs checkservers only checks for the local cell by
>>    default
>>    -- try "fs checkservers -cell cern.ch <http://cern.ch>" to check a
>>    foreign cell.
>>=20
>>=20
>>    -Ben