[OpenAFS] Re: "afs: Lost contact with file server" on the same machine?

Adam Megacz megacz@hcoop.net
Sun, 07 Jun 2009 11:22:00 -0700


For the benefit of the mailing list archives, I'd just like to mention
that upgrading from 1.4.6 to 1.4.10 seems to have helped quite a bit,
but the problem remains.  It just happens less frequently.

  - a

Adam Megacz <megacz@hcoop.net> writes:
> Hello,
>
> We've got a situation where clients seem to be encountering "afs: Lost
> contact with file server" fairly frequently (at least once a week).
> This is happening both for a client machine which is on the same
> ethernet switch as the fileserver (no NAT going on) as well as the
> OpenAFS client running on the server machine losing contact with the
> fileserver process running on the very same machine (so it's unlikely
> to actually be the network).
>
> Sending "kill -TSTP" to the fileserver to increase the logging level
> hasn't revealed anything interesting happening at the time that
> contact is lost.
>
> Is there any way to get more detailed information about the reason why
> the client decided that it had lost contact?  For example, whether the
> failure was due to a timeout, an ICMP unreachable, or
> no-route-to-host, etc?
>
> All machines in question are running OpenAFS 1.4.6 (client and
> server), using the debian packages.
>
> The fileserver is running with these arguments:
>
>   -p 23 -busyat 600 -rxpck 400 -s 1200 -l 1200 -cb 65535 -b 240 -vc 1200
>
> Thanks for any suggestions...
>
>   - a

--