[OpenAFS] Suspect AFS bottlenecks on a web server
Jason Edgecombe
jason@rampaginggeek.com
Wed, 18 Nov 2009 18:51:27 -0500
Nate Gordon wrote:
> On Tue, Nov 17, 2009 at 6:25 PM, Jason Edgecombe <jason@rampaginggeek.com>wrote:
>
>
>> Derrick Brashear wrote:
>>
>>
>>> On Tue, Nov 17, 2009 at 5:09 PM, Jason Edgecombe
>>> <jason@rampaginggeek.com> wrote:
>>>
>>>
>>>
>>>> Hi Everyone,
>>>>
>>>> Our webserver has been brought to a crawl many times over the last few
>>>> weeks. I suspect it's an AFS bottleneck somewhere. I appreciate any help
>>>> I can get.
>>>>
>>>> The web server runs solaris 9 w/openafs 1.4.1.
>>>>
>>>>
>>>>
>>> is that correct?
>>>
>>> that's not even worth debugging. lots of things have been fixed since
>>> then, this could be something new or one of a dozen things already
>>> fixed.
>>>
>>>
>> Yes, 1.4.1 is correct.
>> I'm wondering if increasing the number of daemons would help. The afsd man
>> page mentions that more than 5 or six daemons isn't helpful. I suspect that
>> the number of apache daemons (75) is overwhelming the number of afsd
>> threads/daemons (5).
>>
>> https://lists.openafs.org/mailman/listinfo/openafs-info
>>
>>
>
> As someone who also runs AFS as the backend to a webserver, I can understand
> your problems. My problems stem more specifically from PHP on AFS and that
> PHP the language feels it is necessary to perform lots and lots of trivial
> stat operations. I have theorized that there are some global locking issues
> floating around the internals of the kernel module that cause problems on
> multithreaded systems under high load. Unfortunately I'm a web geek and
> less of a kernel programmer, so I have had limited success in tracking down
> and fixing the problem. Unfortunately I don't think daemons will be
> terribly useful. My understanding is that they aren't used in local cache
> operations, and only used for remote operations when things are getting
> behind. I'm currently running 6 daemons for 500 apache threads.
>
> I would also echo Derrick's comment on the age of the version you are
> using. I have noticed some significant improvements as the 1.4 branch has
> gone on.
>
>
Thanks for the info about the daemons. We have lots of sites running
Joomla and PHP. I noticed a 5% vcache miss rate compared to a 1% dcache
miss rate on our web server. That corroborates your statement about stat
calls.
Derrick, I have 1.4.10 with the
STABLE14-background-fsync-consistency-issues patch already compiled and
ready to deploy. Would that be new enough to consider debugging?
I'm planning on upgrading our web server to 1.4.10 in December.
Jason