[OpenAFS] Re: bonnie++ on OpenAFS

Marc Dionne marc.c.dionne@gmail.com
Mon, 22 Nov 2010 19:36:28 -0500

On Mon, Nov 22, 2010 at 6:56 PM, Achim Gsell <achim.gsell@psi.ch> wrote:
> On Nov 23, 2010, at 12:15 AM, Simon Wilkinson wrote:
>> On 22 Nov 2010, at 23:06, Achim Gsell wrote:
>>> 3.) But if I first open 8 files and - after this is done - start writin=
g to these files sequentially, the problem occurs. The difference to 1.) an=
d 2.) is, that I have these 8 open files while the test is running. This si=
mulates the "putc-test" of bonnie++ more or less:
>> AFS is a write-on-close filesystem, so holding all of these files open m=
eans that it is trying really hard not to flush any data back to the filese=
rver. However, at some point the cache fills, and it has to start writing d=
ata back. In 1.4, we make some really bad choices about which data to write=
 back, and so we end up thrashing the cache. With Marc Dionne's work in 1.5=
, we at least have the ability to make better choices, but nobody has reall=
y looked in detail at what happens when the cache fills, as the best soluti=
on is to avoid it happening in the first place!
> Sounds reasonable. But I have the same problem with a 9GB disk-cache, a 1=
GB disk-cache, 1GB mem-cache and a 256kB mem-cache: I can write 6 GB pretty=
 fast then performance drops to < 3MB/s ...
> So long
> Achim

Same question as Simon.. what's your memory size, and also what's your
dirty background ratio (cat /proc/sys/vm/dirty_background_ratio)?
Quite a bit of writing can occur before the VM decides to initiate
writeback, so issues with the cache can show up later than one would
think if there's a lot of memory and/or the dirty ratio is set high.

The cache manager has 2 basic problems in this situation:
1 - It only tries to write back data for the file it's currently
writing.  Data from the earlier files is occupying most of the cache
but won't be evicted.  So it ends up spinning within a small section
of cache.
2 - It can repeatedly flush out the same data to the server.  I don't
understand exactly what's occurring in this particular case, but
looking at packet traces (I reproduced it here), I see the client
sending a series of overlapping ranges (0-4k, 0-8k... 0-1MB) to the
server.  So on average at that point it is writing each 4k block 128
times to the server...