[OpenAFS] Suspect AFS bottlenecks on a web server

Jason Edgecombe jason@rampaginggeek.com
Wed, 18 Nov 2009 20:00:30 -0500


Thanks, will do.

Derrick Brashear wrote:
> deploy 1.4.10 and that's worth poking
>
> Derrick
>
>
> On Nov 18, 2009, at 6:51 PM, Jason Edgecombe <jason@rampaginggeek.com> 
> wrote:
>
>> Nate Gordon wrote:
>>> On Tue, Nov 17, 2009 at 6:25 PM, Jason Edgecombe 
>>> <jason@rampaginggeek.com>wrote:
>>>
>>>
>>>> Derrick Brashear wrote:
>>>>
>>>>
>>>>> On Tue, Nov 17, 2009 at 5:09 PM, Jason Edgecombe
>>>>> <jason@rampaginggeek.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hi Everyone,
>>>>>>
>>>>>> Our webserver has been brought to a crawl many times over the 
>>>>>> last few
>>>>>> weeks. I suspect it's an AFS bottleneck somewhere. I appreciate 
>>>>>> any help
>>>>>> I can get.
>>>>>>
>>>>>> The web server runs solaris 9 w/openafs 1.4.1.
>>>>>>
>>>>>>
>>>>>>
>>>>> is that correct?
>>>>>
>>>>> that's not even worth debugging. lots of things have been fixed since
>>>>> then, this could be something new or one of a dozen things already
>>>>> fixed.
>>>>>
>>>>>
>>>> Yes, 1.4.1 is correct.
>>>> I'm wondering if increasing the number of daemons would help. The 
>>>> afsd man
>>>> page mentions that more than 5 or six daemons isn't helpful. I 
>>>> suspect that
>>>> the number of apache daemons (75) is overwhelming the number of afsd
>>>> threads/daemons (5).
>>>>
>>>> https://lists.openafs.org/mailman/listinfo/openafs-info
>>>>
>>>>
>>>
>>> As someone who also runs AFS as the backend to a webserver, I can 
>>> understand
>>> your problems.  My problems stem more specifically from PHP on AFS 
>>> and that
>>> PHP the language feels it is necessary to perform lots and lots of 
>>> trivial
>>> stat operations.  I have theorized that there are some global 
>>> locking issues
>>> floating around the internals of the kernel module that cause 
>>> problems on
>>> multithreaded systems under high load.  Unfortunately I'm a web geek 
>>> and
>>> less of a kernel programmer, so I have had limited success in 
>>> tracking down
>>> and fixing the problem.  Unfortunately I don't think daemons will be
>>> terribly useful.  My understanding is that they aren't used in local 
>>> cache
>>> operations, and only used for remote operations when things are getting
>>> behind.  I'm currently running 6 daemons for 500 apache threads.
>>>
>>> I would also echo Derrick's comment on the age of the version you are
>>> using.  I have noticed some significant improvements as the 1.4 
>>> branch has
>>> gone on.
>>>
>>>
>> Thanks for the info about the daemons. We have lots of sites running 
>> Joomla and PHP. I noticed a 5% vcache miss rate compared to a 1% 
>> dcache miss rate on our web server. That corroborates your statement 
>> about stat calls.
>>
>> Derrick, I have 1.4.10 with the 
>> STABLE14-background-fsync-consistency-issues patch already compiled 
>> and ready to deploy. Would that be new enough to consider debugging?
>>
>> I'm planning on upgrading our web server to 1.4.10 in December.
>>
>> Jason
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>