[OpenAFS] File Server appears to stop responding

Randy Kemp rkemp@srhs.net
Mon, 13 Oct 2008 10:14:00 -0400


Derrick Brashear wrote:
> On Mon, Oct 13, 2008 at 9:52 AM, Randy Kemp <rkemp@srhs.net> wrote:
>   
>> I had users again today to test with.  The problem with fileserver
>> ceasing to respond and generating the "CallPreamble: Couldn't get CPS.
>> Too many lockers" error occurred again.
>>     
> To the app server, or to both machines? I assume, actually, only to
> the app server.
>   
The last time it occurred (when I sent the first message) it stopped 
responding to all connected clients.  I did not think to test that this.
>> I'm now running fileserver with the following parameters, "-p 128 -b 512
>> -l 3072 -s 3072 -vc 3072 -cb 65536 -busyat 1536 -rxpck 1024 -nojumbo".
>> At the time that the problem started there were two physical clients
>> connected, one was a standalone workstation and the other was the
>> aforementioned application server with approximately 40 users logged
>> in.  All requests from said application server are now coming from a
>> single address to a single interface on the AFS server.
>>     
>> It turns out that restarting AFS vs. rebooting the server does not make
>> the problem go away as I previously thought.  It was purely coincidental
>> that I had also restarted the application server last time.  What I have
>> now discovered is that even rebooting the AFS server does not resolve
>> the problem (errors start immediately upon startup) and it now appears
>> that the problem is only resolved by restarting the client on the
>> application server.  The application server is running OpenAFS client
>> version 1.4.7 on Ubuntu Linux with kernel version 2.6.24.
>>     
>
> echo t > /proc/sysrq-trigger on the application server, as root, when
> the fileserver won't talk to it, collect the system log from e.g.
> /var/log/messages with the backtrace in it, and let us see that.
>   
I have about 90 users that will try to log in to the app server later 
today so I'm sure it will do it again.  I'll do this and test it from 
other clients if/when it happens.

-- 
Randy Kemp