[OpenAFS] tcpoob timeline

Simon Wilkinson simonxwilkinson@gmail.com
Sat, 27 Oct 2012 16:31:59 +0100


On 26 Oct 2012, at 23:40, Jeffrey Altman wrote:
> I have significant concerns about the design of TCP OOB as it was
> described at EAKC2012.

On the contrary, I think Out of Band support for AFS is a very =
interesting prospect. As I discussed with Andrew and Hartmut in =
Edinburgh, I think we could use an OOB negotiation protocol to support a =
lot of interesting innovations, from TCP transfers, to /vicep-access =
style use of an underlying cluster filesystem, to cluster specific data =
transfer protocols.

The problem is that RX is trying to solve two discrete problems. On the =
one hand it carries metadata RPCs. These are very short lived transfers, =
typically comprised of a single packet in each direction. For AFS to =
appear fast to the end user (the latency problem CERN discussed in their =
presentation) we need to be able to handle these single packets as =
quickly as possible. On the other hand, RX also carries AFS's bulk data. =
Bulk data transfers tend to be large packet flows, and any negotiation =
overhead is quickly mitigated if it makes the transfer faster. Being =
able to use a different transport for bulk operations actually makes a =
lot of sense. Other filesystems, such as Lustre, have already gone down =
this path.

I do believe that we can make the current RX implementation =
significantly faster - and that this will aid both bulk and metadata =
operations. However, it is unlikely that we can ever reach the raw =
performance of TCP, especially when TCP is aided by in-kernel DMA =
splitting packet decoding across multiple cores, and by specialised =
firmware in the network cards themselves. Whilst there is increasing =
evidence that UDP based protocols are more efficient than TCP ones in =
some network scenarios, they gain this added efficiency by using =
different flow control models than TCP. Whilst RX has a solely TCP =
inspired design, we're not going to surpass TCP. One really interesting =
possibility of generic out of band support is that we could support =
transports such as UDT, when we know that we're on a network that can =
deliver that kind of performance.

So, I'm firmly in favour of working on standardising a mechanism for =
negotiating out of band transfers.

I think it's also worth bearing in mind that the implementation that =
Andrew outlined was a proof-of-concept version. One issue that we, as a =
community, are going to have to decide is to what extent we're prepared =
to keep adding new features that only work on a single operating system.
I would also really like to see some performance numbers for its use =
from multiple clients on a loaded fileserver. But we can cross those =
bridges as part of the development process.

Cheers,

Simon=