[OpenAFS] Optimization for Throughput

Daniel Clark dclark@pobox.com
Tue, 10 Oct 2006 13:46:29 -0400

On 10/10/06, Joseph Kulisics <kulisics@chem.ucla.edu> wrote:
> If there's a performance tuning guide, a FAQ, or a message thread on the
> subject, please, let me know. I looked around the web a lot, but I didn't
> find any guide. (Maybe I just haven't found the right search words.)

There was something on this at this year's "AFS & Kerberos Best
Practices Workshop" entitled "Tuning the OpenAFS UNIX client cache
manager" [1]

If you do mailing list searches you will also come up with some hits.
A good keyword to use is "chucksize".

I did some testing a few months ago, and the only way I was able to
get acceptable performance on GigE (slightly better than our low-end
Network Appliance filers / NFSv3) was with OpenAFS 1.4.1 with these
cache options:

    OPTIONS="-verbose -nosettime -memcache -chunksize 18 -stat 2800
-daemons 5 -volumes 128"


This was slightly faster, at the cost of much more memory use:

    OPTIONS="-verbose -nosettime -memcache -chunksize 20 -stat 2800
-daemons 5 -volumes 128"


The memcache, unlike the diskcache, divides the cache into a uniform
number of equal-size chunks, so a chunksize of 20 with a 64MB cache
would quickly result in OpenAFS giving "too many files open" type
errors. (For details search the archives; I may not have the specifics
completely right here, but I know there was some issue like this)

However since then I've been told by people who seem to be "in the
know" that the memcache code is unloved and tends to fall over /
corrupt things, esp. under heavy multiuser load (my tests were all as
a single user and not over long periods of time).

IMHO I think that for many I-need-speed use cases you really need to
bypass local disk, because in the disk cache case (a) you are going to
be slowed down by a factor of at least 2, since the data must first go
to AFS cache disk and then wherever the application puts it (which
could be a in-memory structure or another network filesystem), and (b)
a single local disk can be much slower than a fast RAID array over
GigE/10GigE (possibly 100s of spindles at >15k RPM vs. 1 spindle at
<6000RPM in a worst case scenario). There has been some work in the
past on a cache-bypass mode for OpenAFS or something like that, but I
don't think it ever got integrated into a mainline release.

Therefore what I'm working on now (and I think some other people are
already doing) is using the disk cache code against an in-RAM
filesystem; this requires changes to the init scripts and the use of
OS-specific commands to set up that in-memory filesystem, and I'm not
sure if it will work everywhere; I have tested on AIX with in-memory
JFS and that seems to work just as well as memcache; I assume it will
be trivial on GNU/Linux; and I haven't looked at Solaris at all yet.

If anyone is already doing this, examples of working init scripts
would of course be appreciated :-)

[1] Tuning the OpenAFS UNIX client cache manager

Daniel Joseph Barnhart Clark