[OpenAFS] Suspect AFS bottlenecks on a web server

Nate Gordon nlgordon@gmail.com
Wed, 18 Nov 2009 09:49:01 -0600


--0016e6d77e127ee4a60478a72ef6
Content-Type: text/plain; charset=UTF-8

On Tue, Nov 17, 2009 at 6:25 PM, Jason Edgecombe <jason@rampaginggeek.com>wrote:

> Derrick Brashear wrote:
>
>> On Tue, Nov 17, 2009 at 5:09 PM, Jason Edgecombe
>> <jason@rampaginggeek.com> wrote:
>>
>>
>>> Hi Everyone,
>>>
>>> Our webserver has been brought to a crawl many times over the last few
>>> weeks. I suspect it's an AFS bottleneck somewhere. I appreciate any help
>>> I can get.
>>>
>>> The web server runs solaris 9 w/openafs 1.4.1.
>>>
>>>
>>
>> is that correct?
>>
>> that's not even worth debugging. lots of things have been fixed since
>> then, this could be something new or one of a dozen things already
>> fixed.
>>
> Yes, 1.4.1 is correct.
> I'm wondering if increasing the number of daemons would help. The afsd man
> page mentions that more than 5 or six daemons isn't helpful. I suspect that
> the number of apache daemons (75) is overwhelming the number of afsd
> threads/daemons (5).
>
> https://lists.openafs.org/mailman/listinfo/openafs-info
>

As someone who also runs AFS as the backend to a webserver, I can understand
your problems.  My problems stem more specifically from PHP on AFS and that
PHP the language feels it is necessary to perform lots and lots of trivial
stat operations.  I have theorized that there are some global locking issues
floating around the internals of the kernel module that cause problems on
multithreaded systems under high load.  Unfortunately I'm a web geek and
less of a kernel programmer, so I have had limited success in tracking down
and fixing the problem.  Unfortunately I don't think daemons will be
terribly useful.  My understanding is that they aren't used in local cache
operations, and only used for remote operations when things are getting
behind.  I'm currently running 6 daemons for 500 apache threads.

I would also echo Derrick's comment on the age of the version you are
using.  I have noticed some significant improvements as the 1.4 branch has
gone on.

-- 
-Nathan Gordon

If the database server goes down and there is no code to hear it, does it
really go down?
<esc>:wq<CR>

--0016e6d77e127ee4a60478a72ef6
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<br><br><div class=3D"gmail_quote">On Tue, Nov 17, 2009 at 6:25 PM, Jason E=
dgecombe <span dir=3D"ltr">&lt;<a href=3D"mailto:jason@rampaginggeek.com">j=
ason@rampaginggeek.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_=
quote" style=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt =
0pt 0.8ex; padding-left: 1ex;">
<div class=3D"im">Derrick Brashear wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On Tue, Nov 17, 2009 at 5:09 PM, Jason Edgecombe<br>
&lt;<a href=3D"mailto:jason@rampaginggeek.com" target=3D"_blank">jason@ramp=
aginggeek.com</a>&gt; wrote:<br>
 =C2=A0<br>
<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi Everyone,<br>
<br>
Our webserver has been brought to a crawl many times over the last few<br>
weeks. I suspect it&#39;s an AFS bottleneck somewhere. I appreciate any hel=
p<br>
I can get.<br>
<br>
The web server runs solaris 9 w/openafs 1.4.1.<br>
 =C2=A0 =C2=A0<br>
</blockquote>
<br>
is that correct?<br>
<br>
that&#39;s not even worth debugging. lots of things have been fixed since<b=
r>
then, this could be something new or one of a dozen things already<br>
fixed.<br>
</blockquote></div>
Yes, 1.4.1 is correct.<br>
I&#39;m wondering if increasing the number of daemons would help. The afsd =
man page mentions that more than 5 or six daemons isn&#39;t helpful. I susp=
ect that the number of apache daemons (75) is overwhelming the number of af=
sd threads/daemons (5).<br>
<font color=3D"#888888">
</font><br><div><div class=3D"h5">
<a href=3D"https://lists.openafs.org/mailman/listinfo/openafs-info" target=
=3D"_blank">https://lists.openafs.org/mailman/listinfo/openafs-info</a><br>
</div></div></blockquote></div><br>As someone who also runs AFS as the back=
end to a webserver, I can understand your problems.=C2=A0 My problems stem =
more specifically from PHP on AFS and that PHP the language feels it is nec=
essary to perform lots and lots of trivial stat operations.=C2=A0 I have th=
eorized that there are some global locking issues floating around the inter=
nals of the kernel module that cause problems on multithreaded systems unde=
r high load.=C2=A0 Unfortunately I&#39;m a web geek and less of a kernel pr=
ogrammer, so I have had limited success in tracking down and fixing the pro=
blem.=C2=A0 Unfortunately I don&#39;t think daemons will be terribly useful=
.=C2=A0 My understanding is that they aren&#39;t used in local cache operat=
ions, and only used for remote operations when things are getting behind.=
=C2=A0 I&#39;m currently running 6 daemons for 500 apache threads.<br>
<br>I would also echo Derrick&#39;s comment on the age of the version you a=
re using.=C2=A0 I have noticed some significant improvements as the 1.4 bra=
nch has gone on.<br clear=3D"all"><br>-- <br>-Nathan Gordon<br><br>If the d=
atabase server goes down and there is no code to hear it, does it really go=
 down?<br>
&lt;esc&gt;:wq&lt;CR&gt;<br>

--0016e6d77e127ee4a60478a72ef6--