[OpenAFS] "afs: Lost contact with file server" on the same machine?

Adam Megacz megacz@hcoop.net
Thu, 09 Apr 2009 12:41:49 -0700


We've got a situation where clients seem to be encountering "afs: Lost
contact with file server" fairly frequently (at least once a week).
This is happening both for a client machine which is on the same
ethernet switch as the fileserver (no NAT going on) as well as the
OpenAFS client running on the server machine losing contact with the
fileserver process running on the very same machine (so it's unlikely
to actually be the network).

Sending "kill -TSTP" to the fileserver to increase the logging level
hasn't revealed anything interesting happening at the time that
contact is lost.

Is there any way to get more detailed information about the reason why
the client decided that it had lost contact?  For example, whether the
failure was due to a timeout, an ICMP unreachable, or
no-route-to-host, etc?

All machines in question are running OpenAFS 1.4.6 (client and
server), using the debian packages.

The fileserver is running with these arguments:

  -p 23 -busyat 600 -rxpck 400 -s 1200 -l 1200 -cb 65535 -b 240 -vc 1200

Thanks for any suggestions...

  - a