[OpenAFS] Re: bonnie++ on OpenAFS

Achim Gsell achim.gsell@psi.ch
Mon, 22 Nov 2010 23:12:57 +0100


On Nov 22, 2010, at 8:15 PM, Andrew Deason wrote:

> On Mon, 22 Nov 2010 20:01:31 +0100
> Achim Gsell <achim.gsell@psi.ch> wrote:
> 
>>> If the latter (or regardless), I'd try increasing the size
>>> of the cache; since you're seeing traffic across the network, I'd
>>> suspect you're thrashing.
>> 
>> Trashing? Mmh. I can write 8 1 GB in parallel with dd without problems
>> ...
> 
> Are you sure the access pattern for that is the same as the bonnie++
> test that's running?

OK, here is a simple shell script reproducing the problem:

#!/bin/bash

exec 4> 1
exec 5> 2
exec 6> 3
exec 7> 4
exec 8> 5
exec 9> 6
exec 10> 7

dd if=/dev/zero bs=1024k count=1024 1>&4
dd if=/dev/zero bs=1024k count=1024 1>&5
dd if=/dev/zero bs=1024k count=1024 1>&6
dd if=/dev/zero bs=1024k count=1024 1>&7
dd if=/dev/zero bs=1024k count=1024 1>&8
dd if=/dev/zero bs=1024k count=1024 1>&9
dd if=/dev/zero bs=1024k count=1024 1>&10
#EOF


So, I guess, the trouble makers are the open file descriptors ... 

> I mean, if it's writing to random places in these files, for example,
> your working set is somewhere closer to 8G, and you have a cache that is
> 1G, well...

bonnie++ writes sequentially to the each file, there is no random pattern.

> 
>> I will store a network dump in my public AFS directory and tell you as
>> soon as I have it - may take some time ...
> 
> Before you do this, take a look at 'xstat_cm_test <client> -collID 2
> -onceonly' (again before/after to be on the safe side), specifically at
> the hits vs misses, and the numbers for FetchData, FetchStatus, and
> StoreData, which will give some information about how you're using the
> cache and how much you're hitting the server.
> 
> (If you want to show the data here, putting it in public AFS or a
> pastebin may be nice to the other list denizens)

/afs/psi.ch/user/g/gsell/public/dd2afs.tcpdump

Achim