[OpenAFS-devel] Faster reads on Linux
Simon Wilkinson
sxw@inf.ed.ac.uk
Wed, 15 Jul 2009 00:10:25 +0100
I've been done some work recently at improving the read performance of
the Linux client. We're in the process of moving an RPM delivery
system away from using squid caches, to using an httpd backed by the
AFS disk cache - in doing so, the very poor performance of this
solution compared to squid was noted. The work I've done has more than
doubled performance in this application, and shows a substantial
benefit in many other cases - see http://homepages.inf.ed.ac.uk/sxw/2d-read.png
for iozone's view.
The patch series at /afs/inf.ed.ac.uk/user/s/sxw/Public/faster-reads/
(against 1.4.11) is presented for comment. Essentially this series
splits the changes into four distinct chunks:
Firstly, minimise calls to crref(), and give cache hits a fast path at
the start of readpages. We take advantage of that fact that we know
that reads will only occur in page sized chunks and that, providing
chunksize>pagesize, a read will never cross a page boundary, to
significantly reduce the amount of work that we need to do in
preparing for a read.
Secondly, change from kmap()ing our page and using ->read() to load a
page to using the backing filesystem's own readpage() call. Due to the
fact that we can't swap pages between filesystems, this requires us to
create a page to read the cached data into, and then to perform a page
copy between the backing cache's page and the filesystem page. None
the less, this is significantly faster than the memory management
tricks we were playing with read().
Thirdly, introduce readahead. Typically AFS has had readahead disabled
(as we use prefetch to get the next chunk from the fileserver,
anyway). However, disabling readahead means that there is no
opportunity to prefill the page cache with the 'next' page that the
process wants. This means that the process will block whilst that page
is fetched in from disk, which seriously degrades performance in
situations where you are streaming data off disk and out onto the
network. This patch enables readahead, but does so in the foreground.
This means that the calling process will block not only while it's
required data is read, but whilst the whole readahead chunk is pulled
from disk. Obviously not ideal - but still manages to be faster than
the vanilla cache manager when streaming files.
Finally, do the readahead in the background. This actually ends up
being harder than it looks, as Linux won't share the necessary
information with us. Whilst very new kernels do have the ability to
watch for page unlocks, we can't do it. Nor can we create our own
worker queues in which to do the waiting. So, we take a two pronged
approach. The AFS module gains a new kernel thread, which examines all
of the pages which are being read from disk. As each page becomes
unlocked (indicating that the read has completed), it transfers them
into a queue for a separate worker task. This worker task lives in the
kernel's work queue, and takes care of copying the data from the page
that has just been read into the AFS page, and unlocking that page (to
mark it as read for use). We do this in the work queue, as it means we
can be copying one page per processor on the system, giving us some
kind of parallelism. All of this is implemented by using 'real' Linux
locks, so we don't serialise around the GLOCK at all.
Comments, questions, flames?
S.