[OpenAFS] Reporting on some recent benchmark results

Garrett Wollman wollman@csail.mit.edu
Mon, 4 Apr 2011 17:18:54 -0400


Over the past few days I have performed several benchmarks comparing
the performance of various OpenAFS server and client configurations.
Here's the introduction:

	It is a frequent complaint of CSAIL users that AFS is
	"slow". Given the availablility of a spare (not yet deployed)
	AFS server, we were interested in quantifying this slowness,
	and comparing various AFS server and client options. While we
	found statistically significant differences among various
	parameter choices, we found only one choice that made an
	operationally significant difference: most of the performance
	issues with AFS are the result of encrypting data passing over
	the network. Inexplicably, the tenfold difference in
	performance we document accounts for only a ten percent
	difference in CPU utilization. With encryption disabled, AFS
	is competitive with NFSv3.

See the full report at
<http://people.csail.mit.edu/wollman/afs-performance.html>.  Raw
results and test data are available at
<http://people.csail.mit.edu/wollman/>.

Some unrelated comments...

I suspect that the current Rx request dispatcher does too good a job
at distributing related requests to its various threads.  Under LWP
this does not matter, because all the threads get scheduled on a
single CPU, but with pthreads it is likely that one client's requests
will be spread out among all available cores.  This is likely to cause
ping-ponging of the cache lines representing shared data structures
among the CPUs, which is inefficient.  (However, as I describe, this
is likely to be in the noise compared to the overhead of fcrypt.)  Rx
is also likely to cause many unnecessary and expensive inter-processor
interrupts with frequent broadcasts on the condition variable that
most service threads wait on.

A single client is only able to use about 30% of the server's CPU --
20% when fcrypt is disabled.  Yet the server's disk is 90% idle, so
disk waits are clearly not implicated.  I haven't tried enabling the
tracing options in the fileserver to look at the actual request flow.

-GAWollman