[OpenAFS] Re: bonnie++ on OpenAFS
Achim Gsell
achim.gsell@psi.ch
Mon, 22 Nov 2010 23:12:57 +0100
On Nov 22, 2010, at 8:15 PM, Andrew Deason wrote:
> On Mon, 22 Nov 2010 20:01:31 +0100
> Achim Gsell <achim.gsell@psi.ch> wrote:
>
>>> If the latter (or regardless), I'd try increasing the size
>>> of the cache; since you're seeing traffic across the network, I'd
>>> suspect you're thrashing.
>>
>> Trashing? Mmh. I can write 8 1 GB in parallel with dd without problems
>> ...
>
> Are you sure the access pattern for that is the same as the bonnie++
> test that's running?
OK, here is a simple shell script reproducing the problem:
#!/bin/bash
exec 4> 1
exec 5> 2
exec 6> 3
exec 7> 4
exec 8> 5
exec 9> 6
exec 10> 7
dd if=/dev/zero bs=1024k count=1024 1>&4
dd if=/dev/zero bs=1024k count=1024 1>&5
dd if=/dev/zero bs=1024k count=1024 1>&6
dd if=/dev/zero bs=1024k count=1024 1>&7
dd if=/dev/zero bs=1024k count=1024 1>&8
dd if=/dev/zero bs=1024k count=1024 1>&9
dd if=/dev/zero bs=1024k count=1024 1>&10
#EOF
So, I guess, the trouble makers are the open file descriptors ...
> I mean, if it's writing to random places in these files, for example,
> your working set is somewhere closer to 8G, and you have a cache that is
> 1G, well...
bonnie++ writes sequentially to the each file, there is no random pattern.
>
>> I will store a network dump in my public AFS directory and tell you as
>> soon as I have it - may take some time ...
>
> Before you do this, take a look at 'xstat_cm_test <client> -collID 2
> -onceonly' (again before/after to be on the safe side), specifically at
> the hits vs misses, and the numbers for FetchData, FetchStatus, and
> StoreData, which will give some information about how you're using the
> cache and how much you're hitting the server.
>
> (If you want to show the data here, putting it in public AFS or a
> pastebin may be nice to the other list denizens)
/afs/psi.ch/user/g/gsell/public/dd2afs.tcpdump
Achim