[OpenAFS] Re: "afs: Lost contact with file server" on the same
Mon, 8 Jun 2009 01:06:04 -0400
I'm sorry nobody answered you the first time. Some questions:
- Does the "lost contact with server" occur on all clients at the
same time? Or is it scattered which one loses contact?
- For how long does the "lost contact" occur? Is it seconds or
minutes or longer?
- Simple, stupid question: Have you confirmed your hardware is OK and
not causing hiccups in the system?
- Have you tried using rxdebug to see if the fileserver is getting
caught up on something? Try running it when one of the clients claims
it's lost contact with the server.
>From a client (preferably one that isn't having issues but IS on the
same network), try "rxdebug -port 7001" on one of the client machines
(or to talk to the client on your fileserver). Try rxdebug alone
(which defaults to -port 7000) on the fileserver;
When talking to the fileserver, look for multiple lines from the same
machine (which should all be talking to port 7001, the callbacks
port). Unless you have tons of users on that machine, it may indicate
a potential backlog.
When talking to the client, look for any machine with lots of open
calls to some place other than 7000 (which is the fileserver), or lots
of threads waiting for a response.
This is just a start. I'm sure that now that I've responded, 25 other
people will crawl out of the woodwork with other advice and/or try to
imply I'm a total moron. I'm sure between us we can figure out what's