[OpenAFS] Re: "afs: Lost contact with file server" on the same machine?

Russ Allbery rra@stanford.edu
Sun, 14 Jun 2009 14:23:51 -0700


Adam Megacz <megacz@hcoop.net> writes:
> Esther Filderman <mizmoose@gmail.com> writes:

>>  - Does the "lost contact with server" occur on all clients at the
>> same time?  Or is it scattered which one loses contact?

> It is definitely scattered; we've seen situations where one client
> "lost contact" while another seemed to be having no troubles.

>>  - For how long does the "lost contact" occur?  Is it seconds or
>> minutes or longer?

> Around 10-15 minutes, or until the next "fs checks", whichever comes
> first.  Some users know to run "fs checks" to make this go away, but
> most don't.  Others are seeing unsupervised cron/at jobs fail as a
> result of this.

This sounds identical to the problem that we were having with our web
servers that was mostly caused by CGI script tokens expiring and then
scripts continuing to try to access AFS until the file server started
throttling Rx connections.

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>