[OpenAFS] iperf vs rxperf in high latency network

xguan@reliancememory.com xguan@reliancememory.com
Wed, 7 Aug 2019 18:35:44 -0700


Hello,

Can someone kindly explain again the possible reasons why Rx is so painfully
slow for a high latency (~230ms) link? 

>From a user perspective, I wonder if there is any *quick Rx code hacking*
that could help reduce the throughput gap of (iperf2 = 30Mb/s vs rxperf =
800Kb/s) for the following specific case. 

We are considering the possibility of including two hosts ~230ms RTT apart
as server and client. I used iperf2 and rxperf to test throughput between
the two. There is no other connection competing with the test. So this is
different from a low-latency, thread or udp buffer exhaustion scenario. 

iperf2's UDP test shows a bandwidth of ~30Mb/s without packet loss, though
some of them have been re-ordered at the receiver side. Below 5 Mb/s, the
receiver sees no packet re-ordering.  Above 30 Mb/s, packet loss is seen by
the receiver. Test result is pretty consistent at multiple time points
within 24 hours. UDP buffer size used by iperf is 208 KB. Write length is
set at 1300 (-l 1300) which is below the path MTU. 

Interestingly, a quick skim through the iperf2 source code suggests that an
iperf sender does not wait for the receiver's ack. It simply keeps
write(mSettins->mSock, mBuf, mSettings->mBufLen) and timing it to extract
the numerical value for the throughput. It only checks, in the end, to see
if the receiver complains about packet loss. 

rxperf, on the other hand, only gets ~800 Kb/s. What makes it worse is that
it does not seem to be dependent on the window size (-W 32~255), or udpsize
(-u default~512*1024). I tried to re-compile rxperf that has #define
RXPERF_BUFSIZE (1024 * 1024 * 64) instead of the original (512 * 1024). I
did not see a throughput improvement from going above -u 512K. Occasionally
some packets are re-transmitted. If I reduce -W or -u to very small values,
I see some penalty. 

Kernel's rmem_max and wmem_max have been set at 32M for the socket buffer
size in both hosts. rx max mtu is set at "-m 1344" (i.e., 1400 path mtu - 20
IP header - 8 UDP header - 28 Rx header). rxperf is compiled from the 1.8.3
source code. 

I noticed some discussions before at:
https://lists.openafs.org/pipermail/openafs-info/2010-December/035143.html
https://lists.openafs.org/pipermail/openafs-info/2013-June/039661.html
and most recently at
https://openafs-workshop.org/2019/schedule/faster-wan-volume-operations-with
-dpf/ (Very nice work. We look forward to the code commission and merging to
master.)

The theory goes if I have a 32-packet recv/send window (Ack Count) with 1344
bytes of packet size and RTT=230ms, I should expect a theoretical upper
bound of 32 x 8 x 1344 / 0.23 / 1000000 =  1.5 Mb/s. If the AFS-implemented
Rx windows size (32) is really the limiting factor of the throughput, then
the throughput should increase when I increase the window size (-w) above 32
and configure a sufficiently big kernel socket buffer size.

I did not see either of the predictions by the theory above. I wonder if
some light could be shed on:

1. What else may be the limiting factor in my case
2. If there is a quick way to increase recv/send window from 32 to 255 in Rx
code without breaking other parts of AFS. 
3. If there is any quick (maybe dirty) way to leverage the iperf2
observation, relax the wait for ack as long as the received packets are in
order and not lost (that is, get me up to 5Mb/s...)

Thank you in advance.
==========================
Ximeng (Simon) Guan, Ph.D.
Director of Device Technology
Reliance Memory
==========================

iperf2 test
===========
*Server Side
[xmsguan@afsdb1 ~]$ iperf -u -s
------------------------------------------------------------
Server listening on UDP port 5001
Receiving 1470 byte datagrams
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
*Client Side
[xmsguan@afsdb3 ~]$ iperf -u -b 30M -l 1300 -i 1 -t 3 -e -c afsdb1
------------------------------------------------------------
Client connecting to *, UDP port 5001 with pid 6381
Sending 1300 byte datagrams, IPG target: 330.61 us (kalman adjust)
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local * port 53558 connected with * port 5001
[ ID] Interval        Transfer     Bandwidth      Write/Err  PPS
[  3] 0.00-1.00 sec  3.75 MBytes  31.5 Mbits/sec  3026/0     3026 pps
[  3] 1.00-2.00 sec  3.75 MBytes  31.5 Mbits/sec  3025/0     3025 pps
[  3] 0.00-3.00 sec  11.3 MBytes  31.5 Mbits/sec  9075/0     3024 pps
[  3] Sent 9075 datagrams
[  3] Server Report:
[  3]  0.0- 3.0 sec  11.3 MBytes  31.2 Mbits/sec   0.852 ms    0/ 9075 (0%)
[  3] 0.00-3.03 sec  3735 datagrams received out-of-order
[xmsguan@afsdb3 ~]$

rxperf test
===========
24-packet window:

*Server side
[xmsguan@afsdb1 ~]$ ./rxperf server -u 33554432 -W 24
*Client side
./rxperf client -c send -b 1024000 -m 1344 -u 33554432 -W 24 -s afsdb1 -T 3
-D
SEND: threads   1, times        3, bytes        1024000:           32509
msec  [756 kbit/s]
rx stats: free packets 179, allocs 2453, alloc-failures(rcv 0/0,send 0/0,ack
0)
   greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0,
selects 0, sendSelects 0
   packets read: data 3 ack 1680 busy 0 abort 0 ackall 0 challenge 0
response 0 debug 0 params 0 unused 0 unused 0 unused 0 version 0
   other read counters: data 3, ack 1680, dup 0 spurious 0 dally 0
   packets sent: data 2364 ack 4 busy 0 abort 0 ackall 0 challenge 0
response 0 debug 0 params 0 unused 0 unused 0 unused 0 version 0
   other send counters: ack 4, data 2337 (not resends), resends 27, pushed
0, acked&ignored 4715
        (these should be small) sendFailed 0, fatalErrors 0
   Average rtt is 0.233, with 2024 samples
   Minimum rtt is 0.225, maximum is 0.332
   0 server connections, 1 client connections, 1 peer structs, 1 call
structs, 0 free call structs
Peer a0a0a07.7009.
   Rtt 1884, total sent 2364, resent 27
   Packet size 1344
[xmsguan@afsdb3 ~]$

32-packet window 
*Server side
[xmsguan@afsdb1 ~]$ ./rxperf server -u 33554432 -W 32

*Client side
[xmsguan@afsdb3 ~]$ ./rxperf client -c send -b 1024000 -m 1344 -u 33554432
-W 32 -s afsdb1 -T 3 -D
SEND: threads   1, times        3, bytes        1024000:           29755
msec  [825.9 kbit/s]
rx stats: free packets 179, allocs 2453, alloc-failures(rcv 0/0,send 0/0,ack
0)
   greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0,
selects 0, sendSelects 0
   packets read: data 3 ack 1680 busy 0 abort 0 ackall 0 challenge 0
response 0 debug 0 params 0 unused 0 unused 0 unused 0 version 0
   other read counters: data 3, ack 1680, dup 0 spurious 0 dally 0
   packets sent: data 2365 ack 4 busy 0 abort 0 ackall 0 challenge 0
response 0 debug 0 params 0 unused 0 unused 0 unused 0 version 0
   other send counters: ack 4, data 2337 (not resends), resends 28, pushed
0, acked&ignored 4955
        (these should be small) sendFailed 0, fatalErrors 0
   Average rtt is 0.234, with 2081 samples
   Minimum rtt is 0.224, maximum is 0.333
   0 server connections, 1 client connections, 1 peer structs, 1 call
structs, 0 free call structs
Peer a0a0a07.7009.
   Rtt 1840, total sent 2365, resent 28
   Packet size 1344
[xmsguan@afsdb3 ~]$

255-packet window:
*Server side
[xmsguan@afsdb1 ~]$ ./rxperf server -u 33554432 -W 255

*Client side
[xmsguan@afsdb3 ~]$ ./rxperf client -c send -b 1024000 -m 1344 -u 33554432
-W 255 -s afsdb1 -T 3 -D
SEND: threads   1, times        3, bytes        1024000:           32508
msec  [756 kbit/s]
rx stats: free packets 638, allocs 2393, alloc-failures(rcv 0/0,send 0/0,ack
0)
   greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0,
selects 0, sendSelects 0
   packets read: data 3 ack 1670 busy 0 abort 0 ackall 0 challenge 0
response 0 debug 0 params 0 unused 0 unused 0 unused 0 version 0
   other read counters: data 3, ack 1670, dup 0 spurious 0 dally 0
   packets sent: data 2404 ack 4 busy 0 abort 0 ackall 0 challenge 0
response 0 debug 0 params 0 unused 0 unused 0 unused 0 version 0
   other send counters: ack 4, data 2337 (not resends), resends 67, pushed
0, acked&ignored 3969
        (these should be small) sendFailed 0, fatalErrors 0
   Average rtt is 0.232, with 2054 samples
   Minimum rtt is 0.223, maximum is 0.336
   0 server connections, 1 client connections, 1 peer structs, 1 call
structs, 0 free call structs
Peer a0a0a07.7009.
   Rtt 1846, total sent 2404, resent 67
   Packet size 1344
[xmsguan@afsdb3 ~]$