[OpenAFS-devel] RX retransmit timeout value being overestimated? (Poor performance over WAN)

Peter Wells peter.wells@lutraconsulting.co.uk
Sun, 29 Apr 2012 14:38:47 +0100


Hi Simon,

Thanks for your advice and thoughts.  I have set up a test using virtualbox,
Ubuntu Server 12.04 LTS and openafs 1.6.1-1 to allow me to add a more modern
AFS fileserver to my cell.  

I tested 3 scenarios and have the results below which show a massive
improvement:

All tests involve the transfer of the same file.  

Test
TAverage(s)	Max(s)		S.D.
Local to remote (via SSH)
4.0		
Local fileserver(afs v 1.4.12) to remote client (afs v 1.4.10)
12.2		16.5		2.8
Local fileserver(afs v 1.6.1-1) to remote client (afs v 1.4.10)
7.7		10.6		1.3
Local fileserver(afs v 1.6.1-1 with -rxmaxmtu 1402) to remote client (afs v
1.4.10)	5.5		6.1		0.4

The MTU of 1402 was calculated to avoid packets being fragmented as they
pass through the OpenVPN tunnel and seems to greatly increase reliability.  

Our VPN is running over ADSL and there have been some reliability issues
previously so I suspect that there are packets being lost reasonably
regularly.  

I think I'll be updating openafs within the next month or so to the latest
version!

Thanks again for your help.  


Kind regards,

Pete


-----Original Message-----
From: Simon Wilkinson [mailto:simonxwilkinson@gmail.com] 
Sent: 27 April 2012 22:39
To: peter.wells@lutraconsulting.co.uk
Cc: openafs-devel@openafs.org
Subject: Re: [OpenAFS-devel] RX retransmit timeout value being
overestimated? (Poor performance over WAN)


On 27 Apr 2012, at 19:11, Peter Wells wrote:

> I'm trying to work out why the fileserver keeps pausing like this. 

I'm not sure how much of OpenAFS's file transport protocol that you're aware
of, so sorry if some of this retreads old ground. 

OpenAFS uses a UDP based RPC mechanism called RX. RX provides a reliable
connection layer on top of UDP by implementing its own acknowledgment and
congestion control scheme. Originally this was pretty much unique, but over
the years RX has converged more and more on a TCP style mechanism for
congestion control.

Unfortunately, OpenAFS releases up until 1.6 were stuck in a neverworld
between RX's old burst based transmission algorithm, and a TCP-style
mechanism for flow control. The behaviour that you are seeing is a product
of a number of unfortunate issues in the 1.4.x RX stack.
 
> The last packet before the pauses is an ACK from the client with a mixture
of 32 +ve and -ve acknowledgements. then silence between the server and
client for 1.2 seconds.

RX has what we term 'hard' and 'soft' ACKs. A hard ACK moves the congestion
control window forwards, a soft ACK is roughly analogous to a TCP SACK - it
implies that that packet has been received, but we have missing packets and
so we cannot move the window forwards. In 1.4 release, the maximum window
size is 32 packets, which is why you are stalling after with 32 pending
acknowledgments. 

There is a bug in 1.4 which means that we don't immediately start
retransmitting when it becomes obvious that packets have been missed (TCP
will retransmit if more than 2 packets have been received subsequent to a
missing packet). So, we have to wait until the packets time out. A timeout
is a hard error, it forces the connection back into slow start (which drops
the window size), and so you'll see transmission rates slowly ramp back up
from here.

> As the rate picks up, the client will NACK a data packet, and then
subsequent ACK packets grow in length (in terms of the number of ACKS) until
they reach 32, at which time there is another long pause. 

What's interesting about this trace is how regular your stalls are. I can't
easily explain this regularity, other than that it looks like the connection
is regu;arly dropping particular packet types.

>    Average rtt is 0.104, with 17838 samples
>    Minimum rtt is 0.000, maximum is 2.147
>  
> That's a pretty large maximum rtt and I was wondering if this was somehow
skewing the calculation of the retransmit timeout value, somehow causing the
fileserver to snooze before suddenly realising it should be retransmitting
packets. 

RTT calculation in 1.4 is very, very broken, as it feeds far too many
samples into the RTT alogrithm. However, the effect here shouldn't be to
inflate the RTT number itself, just to remove the smoothing factor.

> Any thoughts you have will be much appreciated.  The AFS versions are as
follows in case it helps:

I would be very interested in seeing how 1.6.1 performs with this network
configuration. It is unlikely that any work is going to get done in fixing
the 1.4.x transport, but if you can reproduce these issues with 1.6, I'd
really like to look at some packet traces and work out what's going on.

Cheers,

Simon.