[OpenAFS-devel] .35 sec rx delay bug?

Ken Hornstein kenh@cmf.nrl.navy.mil
Thu, 09 Nov 2006 10:48:57 -0500


>When we're at the subject of already tuned solutions: What 
>possibilities are there to combine this with the use of sendfile()? 
>Experiences from apache and other projects show that there are very 
>noticeable effects even when doing sendfile() on small chunks compared 
>to the classic read/write-approach.

Well ... I am certainly willing to investigate it.  The problems I see are
threefold:

- There are a number of layers between the RxTCP code the file I/O code.
  Figuring out the right way to break those layers down will be interesting,
  to say the least.  Sending data isn't so bad ... receiving data is more
  challenging.

- You really want to make sure the header and bulk data end up in one
  TCP frame.  If you utilize sendfile(), it isn't possible to guarantee
  that because you'll have to do two seperate operations: one write() to
  do the header data, then the sendfile() call to move the bulk data (right
  now writev() is used so header data and bulk data get coalesced into one
  TCP frame).  If you have a series of small TCP frames interspersed with
  large frames, performance will go into the crapper.  The way reads are
  done in RxTCP, it could work ... but I see from at least the Linux
  sendfile() manpage that the reader cannot be a socket, so that takes that
  off the table.  Apache has a much simpler problem; they're not trying
  to have a virtualized multichannel stream protocol over TCP.

  I see that Solaris has sendfilev(), and one of the items it can take is
  a userspace buffer, so that could address the sending issue.  But
  it's not clear to me that the Solaris sendfilev() avoids userspace
  copies, since it's a library function and not a system call.

- If you want to do a checksum of the bulk data, you need to read the
  bulk data it into memory ... and you lose the benefit of sendfile().

--Ken