[OpenAFS] Suspect AFS bottlenecks on a web server
Derrick Brashear
shadow@gmail.com
Wed, 18 Nov 2009 19:46:14 -0500
deploy 1.4.10 and that's worth poking
Derrick
On Nov 18, 2009, at 6:51 PM, Jason Edgecombe <jason@rampaginggeek.com>
wrote:
> Nate Gordon wrote:
>> On Tue, Nov 17, 2009 at 6:25 PM, Jason Edgecombe <jason@rampaginggeek.com
>> >wrote:
>>
>>
>>> Derrick Brashear wrote:
>>>
>>>
>>>> On Tue, Nov 17, 2009 at 5:09 PM, Jason Edgecombe
>>>> <jason@rampaginggeek.com> wrote:
>>>>
>>>>
>>>>
>>>>> Hi Everyone,
>>>>>
>>>>> Our webserver has been brought to a crawl many times over the
>>>>> last few
>>>>> weeks. I suspect it's an AFS bottleneck somewhere. I appreciate
>>>>> any help
>>>>> I can get.
>>>>>
>>>>> The web server runs solaris 9 w/openafs 1.4.1.
>>>>>
>>>>>
>>>>>
>>>> is that correct?
>>>>
>>>> that's not even worth debugging. lots of things have been fixed
>>>> since
>>>> then, this could be something new or one of a dozen things already
>>>> fixed.
>>>>
>>>>
>>> Yes, 1.4.1 is correct.
>>> I'm wondering if increasing the number of daemons would help. The
>>> afsd man
>>> page mentions that more than 5 or six daemons isn't helpful. I
>>> suspect that
>>> the number of apache daemons (75) is overwhelming the number of afsd
>>> threads/daemons (5).
>>>
>>> https://lists.openafs.org/mailman/listinfo/openafs-info
>>>
>>>
>>
>> As someone who also runs AFS as the backend to a webserver, I can
>> understand
>> your problems. My problems stem more specifically from PHP on AFS
>> and that
>> PHP the language feels it is necessary to perform lots and lots of
>> trivial
>> stat operations. I have theorized that there are some global
>> locking issues
>> floating around the internals of the kernel module that cause
>> problems on
>> multithreaded systems under high load. Unfortunately I'm a web
>> geek and
>> less of a kernel programmer, so I have had limited success in
>> tracking down
>> and fixing the problem. Unfortunately I don't think daemons will be
>> terribly useful. My understanding is that they aren't used in
>> local cache
>> operations, and only used for remote operations when things are
>> getting
>> behind. I'm currently running 6 daemons for 500 apache threads.
>>
>> I would also echo Derrick's comment on the age of the version you are
>> using. I have noticed some significant improvements as the 1.4
>> branch has
>> gone on.
>>
>>
> Thanks for the info about the daemons. We have lots of sites running
> Joomla and PHP. I noticed a 5% vcache miss rate compared to a 1%
> dcache miss rate on our web server. That corroborates your statement
> about stat calls.
>
> Derrick, I have 1.4.10 with the STABLE14-background-fsync-
> consistency-issues patch already compiled and ready to deploy. Would
> that be new enough to consider debugging?
>
> I'm planning on upgrading our web server to 1.4.10 in December.
>
> Jason