[OpenAFS-devel] performance study

Harald Barth haba@pdc.kth.se
Tue, 15 Feb 2005 18:14:17 +0100 (MET)


>   1. despite the verbal claim to request an ACK on the last packet sent 
> in a chain, this does not always happen. In Murphy's terms it never 
> happens when it would do good. In my tests, once the windows had 
> reasonably opened the last packet in a chain had the flags field set to 
> 0. Therefore the ACK required to release the next batch was sent only 
> after a timeout (of about 0.3 second). The protocol continues to work, 
> but slowly.

Hm.

> Fixing this brought the speed from 20-30 MB/s to >110 MB/s (memory to 
> memory, LAN). 

Hm, we get typically 20-40 MB/s with LAN latency. Writing much better
(up to double) that reading (numbers from arlas afsfsperf). Do you see
any differences between read and write?

> What was funny is that Hartmut ran my program without any 
> tweaks and got 114 MB/s immediatly. 

<not-so-serious> 
Hartmut has some special "computrones" in Garching, what works for him
does not allways work for the rest of us. Maybe he has some different
speed of light or other source of low latency.
</not-so-serious>

I'd like to test your rx tweaks as I just recently got new AFS servers
to run on. Patch for 1.3.78 or current? Pretty please?

I think rx tries to do two things with packet sizes:

1. Fit more than one rxpacket into a network packet, maybe good if there
   are many small packets to send. That can be prevented with the -nojumbo
   flag.

2. Expand the network packet size to something bigger than the MTU, to
   a maximum of 4 fragments. (#define RX_MAX_FRAGS 4 in rx_globals.h).
   I have observed that an RX_MAX_FRAGS of 1 gives the following
   effect: WAN performance over bad links gets bearable. Perormance is
   little affected because modern machines have CPU to spare. An SS10
   however performs _very_ bad with RX_MAX_FRAGS = 1, but who runs
   SS10s as AFS servers nowadays?

Harald.