[OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

Fri, 27 Jul 2007 20:54:12 +1200

------=_Part_3017_32665842.1185526452514
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Hi

We are running about 20 redhat AS3 based openafs 1.4.2 fileservers. For the
last three days between 4pm-10pm we have been getting 4-6 fileserver stop
serving files with nagios monitoring warning of > 200 blocked connections. I
have turned on debug for the fileserver prcoess and have a log file but
nothing seemed bad to me (not that I would know). The servers are basically
idle during these distruptions with CPU or disk showing very low usage but
we have to be restarted to get access to files back.

We added the -L flag to the fileserver process today to see if this helps
but we are wondering if we can do anything else to find the cause and/or
prevent these disruptions.

We have checked and there are no admin scripts running at these times.

BTW It would not be so bad if the client would fail over to other readonly
volumes but it does not seem to. The fileservers effected seem to have the
user root readonly volume on them but when the servers go into this state
all client that have this server as the highest in the prioirity list just
lock up and need to be restarted. Also despite having 10 readonly volumes to
pcik form the clients tend to hit only a couple.

Cheers

Matt

------=_Part_3017_32665842.1185526452514
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Hi We are running about 20 redhat AS3 based openafs 1.4.2 fileservers. For the last three days between 4pm-10pm we have been getting 4-6 fileserver stop serving files with nagios monitoring warning of &gt; 200 blocked connections. I have turned on debug for the fileserver prcoess and have a log file but nothing seemed bad to me (not that I would know). The servers are basically idle during these distruptions with CPU or disk showing very low usage but we have to be restarted to get access to files back.
 We added the -L flag to the fileserver process today to see if this helps but we are wondering if we can do anything else to find the cause and/or prevent these disruptions. We have checked and there are no admin scripts running at these times.
 BTW It would not be so bad if the client would fail over to other readonly volumes but it does not seem to. The fileservers effected seem to have the user root readonly volume on them but when the servers go into this state all client that have this server as the highest in the prioirity list just lock up and need to be restarted. Also despite having 10 readonly volumes to pcik form the clients tend to hit only a couple.
 Cheers Matt

------=_Part_3017_32665842.1185526452514--