[OpenAFS] Suspect AFS bottlenecks on a web server

Derrick Brashear shadow@gmail.com
Wed, 18 Nov 2009 19:46:14 -0500


deploy 1.4.10 and that's worth poking

Derrick


On Nov 18, 2009, at 6:51 PM, Jason Edgecombe <jason@rampaginggeek.com>  
wrote:

> Nate Gordon wrote:
>> On Tue, Nov 17, 2009 at 6:25 PM, Jason Edgecombe <jason@rampaginggeek.com 
>> >wrote:
>>
>>
>>> Derrick Brashear wrote:
>>>
>>>
>>>> On Tue, Nov 17, 2009 at 5:09 PM, Jason Edgecombe
>>>> <jason@rampaginggeek.com> wrote:
>>>>
>>>>
>>>>
>>>>> Hi Everyone,
>>>>>
>>>>> Our webserver has been brought to a crawl many times over the  
>>>>> last few
>>>>> weeks. I suspect it's an AFS bottleneck somewhere. I appreciate  
>>>>> any help
>>>>> I can get.
>>>>>
>>>>> The web server runs solaris 9 w/openafs 1.4.1.
>>>>>
>>>>>
>>>>>
>>>> is that correct?
>>>>
>>>> that's not even worth debugging. lots of things have been fixed  
>>>> since
>>>> then, this could be something new or one of a dozen things already
>>>> fixed.
>>>>
>>>>
>>> Yes, 1.4.1 is correct.
>>> I'm wondering if increasing the number of daemons would help. The  
>>> afsd man
>>> page mentions that more than 5 or six daemons isn't helpful. I  
>>> suspect that
>>> the number of apache daemons (75) is overwhelming the number of afsd
>>> threads/daemons (5).
>>>
>>> https://lists.openafs.org/mailman/listinfo/openafs-info
>>>
>>>
>>
>> As someone who also runs AFS as the backend to a webserver, I can  
>> understand
>> your problems.  My problems stem more specifically from PHP on AFS  
>> and that
>> PHP the language feels it is necessary to perform lots and lots of  
>> trivial
>> stat operations.  I have theorized that there are some global  
>> locking issues
>> floating around the internals of the kernel module that cause  
>> problems on
>> multithreaded systems under high load.  Unfortunately I'm a web  
>> geek and
>> less of a kernel programmer, so I have had limited success in  
>> tracking down
>> and fixing the problem.  Unfortunately I don't think daemons will be
>> terribly useful.  My understanding is that they aren't used in  
>> local cache
>> operations, and only used for remote operations when things are  
>> getting
>> behind.  I'm currently running 6 daemons for 500 apache threads.
>>
>> I would also echo Derrick's comment on the age of the version you are
>> using.  I have noticed some significant improvements as the 1.4  
>> branch has
>> gone on.
>>
>>
> Thanks for the info about the daemons. We have lots of sites running  
> Joomla and PHP. I noticed a 5% vcache miss rate compared to a 1%  
> dcache miss rate on our web server. That corroborates your statement  
> about stat calls.
>
> Derrick, I have 1.4.10 with the STABLE14-background-fsync- 
> consistency-issues patch already compiled and ready to deploy. Would  
> that be new enough to consider debugging?
>
> I'm planning on upgrading our web server to 1.4.10 in December.
>
> Jason