[OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

Derrick Brashear shadow@gmail.com
Fri, 27 Jul 2007 08:01:14 -0400


------=_Part_4250_9705592.1185537674687
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

use gdb's generate-core-file, or gcore, or pstack if you have it, and get a
backtrace.

On 7/27/07, Matthew Cocker <cockerm@gmail.com> wrote:
>
> Hi
>
> We are running about 20 redhat AS3 based openafs 1.4.2 fileservers. For
> the last three days between 4pm-10pm we have been getting 4-6 fileserver
> stop serving files with nagios monitoring warning of > 200 blocked
> connections. I have turned on debug for the fileserver prcoess and have a
> log file but nothing seemed bad to me (not that I would know). The servers
> are basically idle during these distruptions with CPU or disk showing very
> low usage but we have to be restarted to get access to files back.
>
> We added the -L flag to the fileserver process today to see if this helps
> but we are wondering if we can do anything else to find the cause and/or
> prevent these disruptions.
>
> We have checked and there are no admin scripts running at these times.
>
>
> BTW It would not be so bad if the client would fail over to other readonly
> volumes but it does not seem to. The fileservers effected seem to have the
> user root readonly volume on them but when the servers go into this state
> all client that have this server as the highest in the prioirity list just
> lock up and need to be restarted. Also despite having 10 readonly volumes to
> pcik form the clients tend to hit only a couple.
>
>
> Cheers
>
> Matt
>

------=_Part_4250_9705592.1185537674687
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

use gdb&#39;s generate-core-file, or gcore, or pstack if you have it, and get a backtrace.<br><br><div><span class="gmail_quote">On 7/27/07, <b class="gmail_sendername">Matthew Cocker</b> &lt;<a href="mailto:cockerm@gmail.com">
cockerm@gmail.com</a>&gt; wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hi<br><br>We are running about 20 redhat AS3 based openafs 
1.4.2 fileservers. For the last three days between 4pm-10pm we have been getting 4-6 fileserver stop serving files with nagios monitoring warning of &gt; 200 blocked connections. I have turned on debug for the fileserver prcoess and have a log file but nothing seemed bad to me (not that I would know). The servers are basically idle during these distruptions with CPU or disk showing very low usage but we have to be restarted to get access to files back.
<br><br>We added the -L flag to the fileserver process today to see if this helps but we are wondering if we can do anything else to find the cause and/or prevent these disruptions.<br><br>We have checked and there are no admin scripts running at these times.
<br><br><br>BTW It would not be so bad if the client would fail over to other readonly volumes but it does not seem to. The fileservers effected seem to have the user root readonly volume on them but when the servers go into this state all client that have this server as the highest in the prioirity list just lock up and need to be restarted. Also despite having 10 readonly volumes to pcik form the clients tend to hit only a couple.
<br><br><br>Cheers<br><br>Matt<br>
</blockquote></div><br>

------=_Part_4250_9705592.1185537674687--