[OpenAFS-devel] RX retransmit timeout value being overestimated? (Poor performance over WAN)

Fri, 27 Apr 2012 22:39:22 +0100

On 27 Apr 2012, at 19:11, Peter Wells wrote:

> I=92m trying to work out why the fileserver keeps pausing like this.=20=

I'm not sure how much of OpenAFS's file transport protocol that you're =
aware of, so sorry if some of this retreads old ground.=20

OpenAFS uses a UDP based RPC mechanism called RX. RX provides a reliable =
connection layer on top of UDP by implementing its own acknowledgment =
and congestion control scheme. Originally this was pretty much unique, =
but over the years RX has converged more and more on a TCP style =
mechanism for congestion control.

Unfortunately, OpenAFS releases up until 1.6 were stuck in a neverworld =
between RX's old burst based transmission algorithm, and a TCP-style =
mechanism for flow control. The behaviour that you are seeing is a =
product of a number of unfortunate issues in the 1.4.x RX stack.
=20
> The last packet before the pauses is an ACK from the client with a =
mixture of 32 +ve and =96ve acknowledgements=85 then silence between the =
server and client for 1.2 seconds=85

RX has what we term 'hard' and 'soft' ACKs. A hard ACK moves the =
congestion control window forwards, a soft ACK is roughly analogous to a =
TCP SACK - it implies that that packet has been received, but we have =
missing packets and so we cannot move the window forwards. In 1.4 =
release, the maximum window size is 32 packets, which is why you are =
stalling after with 32 pending acknowledgments.=20

There is a bug in 1.4 which means that we don't immediately start =
retransmitting when it becomes obvious that packets have been missed =
(TCP will retransmit if more than 2 packets have been received =
subsequent to a missing packet). So, we have to wait until the packets =
time out. A timeout is a hard error, it forces the connection back into =
slow start (which drops the window size), and so you'll see transmission =
rates slowly ramp back up from here.

> As the rate picks up, the client will NACK a data packet, and then =
subsequent ACK packets grow in length (in terms of the number of ACKS) =
until they reach 32, at which time there is another long pause.=20

What's interesting about this trace is how regular your stalls are. I =
can't easily explain this regularity, other than that it looks like the =
connection is regu;arly dropping particular packet types.

>    Average rtt is 0.104, with 17838 samples
>    Minimum rtt is 0.000, maximum is 2.147
> =20
> That=92s a pretty large maximum rtt and I was wondering if this was =
somehow skewing the calculation of the retransmit timeout value, somehow =
causing the fileserver to snooze before suddenly realising it should be =
retransmitting packets.=20

RTT calculation in 1.4 is very, very broken, as it feeds far too many =
samples into the RTT alogrithm. However, the effect here shouldn't be to =
inflate the RTT number itself, just to remove the smoothing factor.

> Any thoughts you have will be much appreciated.  The AFS versions are =
as follows in case it helps:

I would be very interested in seeing how 1.6.1 performs with this =
network configuration. It is unlikely that any work is going to get done =
in fixing the 1.4.x transport, but if you can reproduce these issues =
with 1.6, I'd really like to look at some packet traces and work out =
what's going on.

Cheers,

Simon.