[OpenAFS] Repeated lost contact with servers - only at one location

Jan Iven jan.iven@cern.ch
Tue, 15 Dec 2015 14:15:19 +0100


On 12/15/2015 11:53 AM, Orel Gueta wrote:
> Hi Ben,
>
> Thanks for the tip, however if I do "fs checkservers -cell cern.ch
> <http://cern.ch>", I get the same result. Perhaps because
> /etc/openafs/ThisCell is set to CERN.CH <http://CERN.CH>?

yes.

> Either way, regardless if I specify the cell or not, I see a few servers
> down, at cern.ch <http://cern.ch> and in other places.

I would suspect some network thing. Perhaps some stateful firewall is 
timing out too fast for slow servers (or a high-latency network) to 
reply in time. You might want to use "rxdebug afs263.cern.ch 7000 
-version" as a simple ping-like test (which nevertheless uses the AFS 
protocol, unlike real "ping). If that also fails, you have simplified 
the test case.
You could then use "wireshark" to get a network-level packet trace (you 
would expect to note missing packets, i.e client repeatedly sending 
something to UDP/7000 but not getting an answer).
And perhaps see whether your new Ubuntu comes with a newer firewall, and 
try to configure that in "logging" mode, then check whether it happily 
ditches those reply packets from the server..

By the way, the packet-eating device might also be your local home 
router. Perhaps old Ubuntu configured it to open some ports via UPNP, 
and the new release no longer does this.

Cheers
jan

> Orel
>
> On 14 December 2015 at 23:38, Benjamin Kaduk <kaduk@mit.edu
> <mailto:kaduk@mit.edu>> wrote:
>
>     On Mon, 14 Dec 2015, Orel Gueta wrote:
>
>     > - fs checkservers reports a few servers down (likeafs263.cern.ch <http://afs263.cern.ch>), but I
>     > can ping them.
>
>     A quick note -- fs checkservers only checks for the local cell by
>     default
>     -- try "fs checkservers -cell cern.ch <http://cern.ch>" to check a
>     foreign cell.
>
>
>     -Ben
>
>