[OpenAFS] Re: bonnie++ on OpenAFS

Tue, 23 Nov 2010 08:41:21 +0000

> Yep, this is what's happening in the trace Achim provided, too.  
> Every 4k
> we write the chunk. I'm not sure how that's possible unless  
> something is
> closing the file a lot, or the cache is full of stuff we can't kick  
> out.

Actually, it's entirely possible. Here's how it all goes wrong...

When the cache is full, every call to write results in us attempting  
to empty the cache. On Linux the page cache means that we only call  
write once for each 4k chunk. However, our attempts to empty the cache  
are a little pathetic. We just attempt to store all of the chunks of  
the file currently being written back to the fileserver. If it's a new  
file there is only one such chunk - the one that we are currently  
writing. As chunks are much larger than pages, and when a chunk is  
dirty we flush the whole thing to the server, this is why we see  
repeated writes of the same data. The process goes something like this:

*) Write page at 0k, dirties first chunk of file.
*) Discover cache is full, flush first chunk (0->1024k) to the file  
server
*) Write page at 4k, dirties first chunk of file
*) Cache is still full, flush first chunk to file server
*) Write page at 8k, dirties first chunk of file

... and so on.

The problem is that we don't make good decisions when we decide to  
flush the cache. However, any change to flush items which are less  
active will be a behaviour change - in particular, on a multi-user  
system it would mean that one user could break write-on-close for  
other users simply by filling the cache.

Cheers,

Simon.