[OpenAFS] About AFS performance over WAN

Rainer Toebbicke rtb@pclella.cern.ch
Mon, 1 Dec 2008 10:24:34 +0100


Giovanni Bracco schrieb:

> I know that this is a well know problem of the rx protocol, as shown for 
> example by Hartmut Reuter at the last European AFS Conference 2008
> (see slide 49 from 
> http://www.openafs.at/drupal/files/slides/1Day_03/AFS-OSD.pdf), 
> due to the fixed rx window size and combined with network latencies in the 
> order of tenths of milliseconds.
> 
> I am aware that an activity was in  progress for a tcp version of openafs, 
> which probably could solve some of this problem, but  I do not know what is 
> the status of this activity. 
> More generally, what are the plans to increase the AFS performances over WAN, 
> to take advantage of the present day availability of high bandwith 
> connections?
> 

What RX-over-TCP would bring you is to copy all the improvements that went 
into TCP over the past decade of research into RX. And it will make things 
more familiar for network administrators dealing mainly with TCP 
considerations. What it will not bring you is a bulk transfer protocol.

AFS transfers files chunk-wise, while there is read-ahead the transfer is 
essentially still sequential. Due to the RPC nature of the protocol you will 
have a stop every 64K (or 256K, or whatever you typically set it to). A plain 
port to TCP will not change anything there, worse such a start-stop in a 
single TCP stream could very well challenge the sophisticated techniques that 
went into window heuristics and congestion control.

With unlimited development resources AFS would deserve a better suited 
protocol than TCP, in practice with a little more realism my gut feeling is 
that at least some more brain should be devoted to improving plain RX rather 
than betting on another horse. I occasionally tried over the past years, with 
some improvements that Hartmut tested as well, but my brain being what it is 
and the matter relatively complicated results remain modest.

High latency remains a fierce enemy. Some address it through pre-fetches which 
are a double-sided sword! For read, if the file system had reliable knowledge 
about big files (or series of files) to be transferred in their entirety AFS 
could relatively easily be modified to start chunk pre-fetches in parallel, 
slightly shifted in time, over standard RX, solving the start-stop 
problematic. The key here is to do this only if you're sure you're not 
over-speculating and throwing away most of it soon after.

For writes, here at CERN we already run with mods that start chunk 
transmission early while the file is still being written to. Naively thinking 
that would be vastly easier to improve given that much more is known!

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985       Fax: +41 22 767 7155