[OpenAFS] File Server appears to stop responding

Derrick Brashear shadow@gmail.com
Mon, 13 Oct 2008 09:58:34 -0400

On Mon, Oct 13, 2008 at 9:52 AM, Randy Kemp <rkemp@srhs.net> wrote:
> I had users again today to test with.  The problem with fileserver
> ceasing to respond and generating the "CallPreamble: Couldn't get CPS.
> Too many lockers" error occurred again.

To the app server, or to both machines? I assume, actually, only to
the app server.

> I'm now running fileserver with the following parameters, "-p 128 -b 512
> -l 3072 -s 3072 -vc 3072 -cb 65536 -busyat 1536 -rxpck 1024 -nojumbo".
> At the time that the problem started there were two physical clients
> connected, one was a standalone workstation and the other was the
> aforementioned application server with approximately 40 users logged
> in.  All requests from said application server are now coming from a
> single address to a single interface on the AFS server.

> It turns out that restarting AFS vs. rebooting the server does not make
> the problem go away as I previously thought.  It was purely coincidental
> that I had also restarted the application server last time.  What I have
> now discovered is that even rebooting the AFS server does not resolve
> the problem (errors start immediately upon startup) and it now appears
> that the problem is only resolved by restarting the client on the
> application server.  The application server is running OpenAFS client
> version 1.4.7 on Ubuntu Linux with kernel version 2.6.24.

echo t > /proc/sysrq-trigger on the application server, as root, when
the fileserver won't talk to it, collect the system log from e.g.
/var/log/messages with the backtrace in it, and let us see that.