[OpenAFS] Openafs performance with Apache/PHP

Nate Gordon nlgordon@gmail.com
Fri, 27 Jul 2007 14:34:17 -0500

So I'm finally getting around to querying about some issues I've been
having for a long time with openafs 1.4.4 on rhel (3|4|5).  In short,
the performance of my webservers serving up PHP code is less than
stellar in some cases.  Those cases being defined as the requests per
second getting above 30.  Here is an example/summary of the issue I'm

I have a php page, that php page includes some other php files and
makes two database queries.  If I point apache bench at that page on
one of my test servers I get results as follows:

1 concurrent user: 16 requests per second 60ms per request
2 concurrent users: 27 requests per second 73ms per request
5 concurrent users: 32 requests per second 157ms per request
10 concurrent users: 10 requests per second 934ms per request
20 concurrent users: 5 requests per second 3666ms per request
30 concurrent users: 3 requests per second 10899 ms per request

* requests per second being defined as the aggregate requests per
second across all the concurrent users.  Thus 3 users at 3 requests
per second each yields 9 requests per second aggregate.

So as you can see performance drops off to an unusable point.  What I
also noted was that while watching vmstat the number of context
switches soars through the roof.  At the 10 users mark the switches
hovers around ~200,000 per second.  At 20 users we go to ~250,000 and
at 30 users we hit ~270,000 per second.  The machine I'm testing on is
a Dell dual xeon 2.8Ghz hyperthreaded machine with 2GB of ram.  The
machine isn't swapping during any of this and general memory usage
remains fairly constant.  Afsd is configured with:
AFSD_ARGS="-fakestat-all -chunksize 20 -volumes 400 -dcache 50000
-files 200000 -stat 150000 -daemons 20"

I've tried varying the various options to this to try to improve
performance, but things only seem to get worse.  Cache size on this
machine is currently 100MB and the total content being served is <

For completeness I've also tested this setup on a similar machine
running rhel 5 with arla-current and have come up with some
interesting results:

1 user: 22 requests per second 43ms per request
2 users: 32 requests per second 62ms per request
5 users: 36 requests per second 138ms per request
10 users: 36 requests per second 274ms per request
20 users: 37 requests per second 542ms per request
30 users: 36 requests per second 821ms per request

This shows me exactly what I would expect to happen in on the server.
Aggregate requests per second reach a peak performance level which
increases the individual request time.

Before anyone gets offended by anything I might have said, I want to
make it absolutely clear that I am not trying to say that either
openafs or arla is better/worse than the other.  I needed something to
compare to for the purposes of this discussion.  I realize that the
two are very very differently architected in their interactions with
the kernel.  I would like to stick with openafs on my servers since
that has official support from my systems group.

So then I start to wonder about those differences in architecture.  Is
there a point in openafs where it is essentially being single
threaded?  Possibly in cache reading/validation?  Varying the number
of daemons I'm running doesn't appear to affect the performance
greatly until I drop down to something like 1 daemon.

A little bit more about the PHP code I'm running and about PHP in
general.  When you do a require_once in PHP to include a file it
essentially stats every directory component between / and the full
path to the file it is trying to include.  It does this on every
include and does not cache the results.  It also does this in an
attempt to locate the file if the include was done as a relative path,
include_path = ".:/usr/local/lib/php/:/afs/cell/folder/where/my/includes/are/"

This would generate stat system calls in approximately this pattern:
Check first item in include_path for include ("."):
/afs/cell/folder/virtualhost/directory/relative <-- Fails since node
doesn't exist
Check second item in include_path for include ("/usr/local/lib/php/"):
/usr/local/lib/php/relative <-- Fails again for same reason
Check third item in include_path ("/afs/cell/folder/where/my/includes/are"):
<-- Actual fopen

Everything except the final fopen are lstat64 calls as indicated by strace.

As you can see this is pretty expensive to do per page hit, especially
when this is done repeatedly for every included file.  The page in
question includes about 20 or so files.  This is one problem.  I've
also put together the same page by pulling in all the code that would
have been included to generate a self contained php file.  The
performance tops out around 50 requests per second and still has the
same dropoff pattern as you increase the number of simultaneous users
hitting the page when using openafs.  So the problem isn't purley in
the excessive amount of stat system calls.  If anything it shows that
there is a problem somewhere else as well.  Possibly that it requires
that the openafs daemon threads be switched in to answer lots of
questions too often or that the threads have to talk to each other for
every query to the afs file system.

This topic of discussion might be more suited to the dev list, but I
thought I would start here to see if anyone else was serving php files
out of AFS and could reproduce my problems in their environment.

If any more information is needed I would be glad to provide it.

Thanks in advance,

-Nathan Gordon

If the database server goes down and there is no code to hear it, does
it really go down?