[OpenAFS-devel] Rx over TCP to solve some NAT & Firewall issues?

Nickolai Zeldovich kolya@MIT.EDU
Thu, 20 Nov 2003 15:51:22 -0500 (EST)


>                                            apparently jumbograms are
> the way they are because people wanted a form of congestion control on
> afs (controlling number of rx datagrams in a packet).

Rx already has congestion control -- quite similar to TCP Reno with SACK.
It has slow-start, AIMD and fast-recovery.  It doesn't seem to have fast
retransmit, because it still seems to make the assumption that packets can
get reordered.  Maybe we should fix this -- it should be quite simple.  Rx
already has a SACK-like ack packet.

One problem is that currently the window size is limited to 32 packets,
which is 32*1444=46k of bandwidth-delay product.  That means I can only
get ~500KB/sec throughput from east coast to west coast.  This problem is
easy to fix by bumping up the max sender/receiver windows, but that's not
the problem affecting performance in local networks.

I don't believe that Rx over TCP would have to keep more state than Rx
over UDP.  After all, Rx over UDP pretty much keeps a TCP-like connection
state structure in memory in userspace.  One could argue that TCP state is
in unpageable kernel memory, but I think if your server is paging, you've
lost already.

As for the persistence of TCP connections, one could quite easily define
them to be garbage-collectable at any time by either the server or the
client, just like Rx over UDP connections are now.  If the server thinks
it has too many connections open, it'll close idle client connections.

Do people really think that Rx over UDP, designed 15 years ago, can be a
better reliable stream transport than the TCP in today's kernels?  What
features of Rx over UDP are so unique that preclude the use of TCP?

-- kolya