[OpenAFS] File Server appears to stop responding
Mon, 13 Oct 2008 09:58:34 -0400
On Mon, Oct 13, 2008 at 9:52 AM, Randy Kemp <firstname.lastname@example.org> wrote:
> I had users again today to test with. The problem with fileserver
> ceasing to respond and generating the "CallPreamble: Couldn't get CPS.
> Too many lockers" error occurred again.
To the app server, or to both machines? I assume, actually, only to
the app server.
> I'm now running fileserver with the following parameters, "-p 128 -b 512
> -l 3072 -s 3072 -vc 3072 -cb 65536 -busyat 1536 -rxpck 1024 -nojumbo".
> At the time that the problem started there were two physical clients
> connected, one was a standalone workstation and the other was the
> aforementioned application server with approximately 40 users logged
> in. All requests from said application server are now coming from a
> single address to a single interface on the AFS server.
> It turns out that restarting AFS vs. rebooting the server does not make
> the problem go away as I previously thought. It was purely coincidental
> that I had also restarted the application server last time. What I have
> now discovered is that even rebooting the AFS server does not resolve
> the problem (errors start immediately upon startup) and it now appears
> that the problem is only resolved by restarting the client on the
> application server. The application server is running OpenAFS client
> version 1.4.7 on Ubuntu Linux with kernel version 2.6.24.
echo t > /proc/sysrq-trigger on the application server, as root, when
the fileserver won't talk to it, collect the system log from e.g.
/var/log/messages with the backtrace in it, and let us see that.