[OpenAFS] Repeated lost contact with servers - only at one
location
Jan Iven
jan.iven@cern.ch
Tue, 15 Dec 2015 14:15:19 +0100
On 12/15/2015 11:53 AM, Orel Gueta wrote:
> Hi Ben,
>
> Thanks for the tip, however if I do "fs checkservers -cell cern.ch
> <http://cern.ch>", I get the same result. Perhaps because
> /etc/openafs/ThisCell is set to CERN.CH <http://CERN.CH>?
yes.
> Either way, regardless if I specify the cell or not, I see a few servers
> down, at cern.ch <http://cern.ch> and in other places.
I would suspect some network thing. Perhaps some stateful firewall is
timing out too fast for slow servers (or a high-latency network) to
reply in time. You might want to use "rxdebug afs263.cern.ch 7000
-version" as a simple ping-like test (which nevertheless uses the AFS
protocol, unlike real "ping). If that also fails, you have simplified
the test case.
You could then use "wireshark" to get a network-level packet trace (you
would expect to note missing packets, i.e client repeatedly sending
something to UDP/7000 but not getting an answer).
And perhaps see whether your new Ubuntu comes with a newer firewall, and
try to configure that in "logging" mode, then check whether it happily
ditches those reply packets from the server..
By the way, the packet-eating device might also be your local home
router. Perhaps old Ubuntu configured it to open some ports via UPNP,
and the new release no longer does this.
Cheers
jan
> Orel
>
> On 14 December 2015 at 23:38, Benjamin Kaduk <kaduk@mit.edu
> <mailto:kaduk@mit.edu>> wrote:
>
> On Mon, 14 Dec 2015, Orel Gueta wrote:
>
> > - fs checkservers reports a few servers down (likeafs263.cern.ch <http://afs263.cern.ch>), but I
> > can ping them.
>
> A quick note -- fs checkservers only checks for the local cell by
> default
> -- try "fs checkservers -cell cern.ch <http://cern.ch>" to check a
> foreign cell.
>
>
> -Ben
>
>