[OpenAFS] Suspect AFS bottlenecks on a web server

Jason Edgecombe jason@rampaginggeek.com
Wed, 18 Nov 2009 18:51:27 -0500


Nate Gordon wrote:
> On Tue, Nov 17, 2009 at 6:25 PM, Jason Edgecombe <jason@rampaginggeek.com>wrote:
>
>   
>> Derrick Brashear wrote:
>>
>>     
>>> On Tue, Nov 17, 2009 at 5:09 PM, Jason Edgecombe
>>> <jason@rampaginggeek.com> wrote:
>>>
>>>
>>>       
>>>> Hi Everyone,
>>>>
>>>> Our webserver has been brought to a crawl many times over the last few
>>>> weeks. I suspect it's an AFS bottleneck somewhere. I appreciate any help
>>>> I can get.
>>>>
>>>> The web server runs solaris 9 w/openafs 1.4.1.
>>>>
>>>>
>>>>         
>>> is that correct?
>>>
>>> that's not even worth debugging. lots of things have been fixed since
>>> then, this could be something new or one of a dozen things already
>>> fixed.
>>>
>>>       
>> Yes, 1.4.1 is correct.
>> I'm wondering if increasing the number of daemons would help. The afsd man
>> page mentions that more than 5 or six daemons isn't helpful. I suspect that
>> the number of apache daemons (75) is overwhelming the number of afsd
>> threads/daemons (5).
>>
>> https://lists.openafs.org/mailman/listinfo/openafs-info
>>
>>     
>
> As someone who also runs AFS as the backend to a webserver, I can understand
> your problems.  My problems stem more specifically from PHP on AFS and that
> PHP the language feels it is necessary to perform lots and lots of trivial
> stat operations.  I have theorized that there are some global locking issues
> floating around the internals of the kernel module that cause problems on
> multithreaded systems under high load.  Unfortunately I'm a web geek and
> less of a kernel programmer, so I have had limited success in tracking down
> and fixing the problem.  Unfortunately I don't think daemons will be
> terribly useful.  My understanding is that they aren't used in local cache
> operations, and only used for remote operations when things are getting
> behind.  I'm currently running 6 daemons for 500 apache threads.
>
> I would also echo Derrick's comment on the age of the version you are
> using.  I have noticed some significant improvements as the 1.4 branch has
> gone on.
>
>   
Thanks for the info about the daemons. We have lots of sites running 
Joomla and PHP. I noticed a 5% vcache miss rate compared to a 1% dcache 
miss rate on our web server. That corroborates your statement about stat 
calls.

Derrick, I have 1.4.10 with the 
STABLE14-background-fsync-consistency-issues patch already compiled and 
ready to deploy. Would that be new enough to consider debugging?

I'm planning on upgrading our web server to 1.4.10 in December.

Jason