[OpenAFS-devel] .35 sec rx delay bug?

Niklas Edmundsson Niklas.Edmundsson@hpc2n.umu.se
Fri, 10 Nov 2006 13:31:19 +0100 (MET)


On Thu, 9 Nov 2006, Ken Hornstein wrote:

>> When we're at the subject of already tuned solutions: What
>> possibilities are there to combine this with the use of sendfile()?
>> Experiences from apache and other projects show that there are very
>> noticeable effects even when doing sendfile() on small chunks compared
>> to the classic read/write-approach.
>
> Well ... I am certainly willing to investigate it.  The problems I see are
> threefold:
>
> - You really want to make sure the header and bulk data end up in one
>  TCP frame.  If you utilize sendfile(), it isn't possible to guarantee
>  that because you'll have to do two seperate operations: one write() to
>  do the header data, then the sendfile() call to move the bulk data (right
>  now writev() is used so header data and bulk data get coalesced into one
>  TCP frame).  If you have a series of small TCP frames interspersed with
>  large frames, performance will go into the crapper.  The way reads are
>  done in RxTCP, it could work ... but I see from at least the Linux
>  sendfile() manpage that the reader cannot be a socket, so that takes that
>  off the table.  Apache has a much simpler problem; they're not trying
>  to have a virtualized multichannel stream protocol over TCP.

Yeah, sendfile() focuses on the sending-issue.

>  I see that Solaris has sendfilev(), and one of the items it can take is
>  a userspace buffer, so that could address the sending issue.  But
>  it's not clear to me that the Solaris sendfilev() avoids userspace
>  copies, since it's a library function and not a system call.

AIX sendfile can also do this. In the end I guess you'll want to do 
some sort of portability layer, or let the #ifdefs eat your code.

> - If you want to do a checksum of the bulk data, you need to read the
>  bulk data it into memory ... and you lose the benefit of sendfile().

Isn't the TCP checksumming enough? Anyhow, encryption would also have 
this effect.

In any case, I was just curious about it being possible at all. Modern 
servers shouldn't have any problems delivering gige-speed without 
sendfile given sane code, it will be very interesting to see what 
happens when 10gige gets common though. A wild guess is that we'll be 
limited by disk speed.

/Nikke
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se     |    nikke@hpc2n.umu.se
---------------------------------------------------------------------------
  "If the Apocalypse comes, beep me"- Buffy
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=