[OpenAFS] Re: bonnie++ on OpenAFS
Tue, 23 Nov 2010 08:41:21 +0000
> Yep, this is what's happening in the trace Achim provided, too.
> Every 4k
> we write the chunk. I'm not sure how that's possible unless
> something is
> closing the file a lot, or the cache is full of stuff we can't kick
Actually, it's entirely possible. Here's how it all goes wrong...
When the cache is full, every call to write results in us attempting
to empty the cache. On Linux the page cache means that we only call
write once for each 4k chunk. However, our attempts to empty the cache
are a little pathetic. We just attempt to store all of the chunks of
the file currently being written back to the fileserver. If it's a new
file there is only one such chunk - the one that we are currently
writing. As chunks are much larger than pages, and when a chunk is
dirty we flush the whole thing to the server, this is why we see
repeated writes of the same data. The process goes something like this:
*) Write page at 0k, dirties first chunk of file.
*) Discover cache is full, flush first chunk (0->1024k) to the file
*) Write page at 4k, dirties first chunk of file
*) Cache is still full, flush first chunk to file server
*) Write page at 8k, dirties first chunk of file
... and so on.
The problem is that we don't make good decisions when we decide to
flush the cache. However, any change to flush items which are less
active will be a behaviour change - in particular, on a multi-user
system it would mean that one user could break write-on-close for
other users simply by filling the cache.