[OpenAFS] File Server appears to stop responding

Derrick Brashear shadow@gmail.com
Sun, 28 Sep 2008 19:10:16 -0400


On Fri, Sep 26, 2008 at 3:17 PM, Randy Kemp <rkemp@srhs.net> wrote:
> I've been experiencing a problem where fileserver appears to simply stop
> responding to requests.  I currently only have one AFS server (OpenAFS
> version 1.4.7).
>
> Currently the primary host accessing AFS is an application server for 104
> thin-clients.  This problem seems to arise when a large number of users
> (30+) try to log in to this application server at about the same time.  The
> users home directories are on AFS.  Once it stops responding no host is able
> to access it.  Not even the local client.
>
> The AFS server has 5 interfaces on different networks.  The application
> server has 3 interfaces on 3 of the 5 networks.  However, the CellServDB on
> the application server only references the address for one of the interfaces
> on the AFS server because that's the route I want the traffic to take.
>
> This setup seemed to be working fine for about a month before the problem
> started occurring.  In the FileLog I'm now frequently seeing "CallPreamble:
> Couldn't get CPS. Too many lockers" even when everything appears to be
> working correctly.  When it does occur, restarting the OpenAFS daemons does
> not fix the problem.

Nor will it. It's telling you that you have misbehaving clients, and
those clients are not being serviced. And you don't want them to be,
as if they were you would have problems much more often.

The routing failure messages at the end may suggest "the real issue".
Do you have clients which are able to send to the fileserver, but are
unreachable from it?

Also, you have far too few threads.

Add -p 128 to the fileserver arguments, and your life will get much better.