[OpenAFS-devel] .35 sec rx delay bug?
Ken Hornstein
kenh@cmf.nrl.navy.mil
Thu, 09 Nov 2006 10:48:57 -0500
>When we're at the subject of already tuned solutions: What
>possibilities are there to combine this with the use of sendfile()?
>Experiences from apache and other projects show that there are very
>noticeable effects even when doing sendfile() on small chunks compared
>to the classic read/write-approach.
Well ... I am certainly willing to investigate it. The problems I see are
threefold:
- There are a number of layers between the RxTCP code the file I/O code.
Figuring out the right way to break those layers down will be interesting,
to say the least. Sending data isn't so bad ... receiving data is more
challenging.
- You really want to make sure the header and bulk data end up in one
TCP frame. If you utilize sendfile(), it isn't possible to guarantee
that because you'll have to do two seperate operations: one write() to
do the header data, then the sendfile() call to move the bulk data (right
now writev() is used so header data and bulk data get coalesced into one
TCP frame). If you have a series of small TCP frames interspersed with
large frames, performance will go into the crapper. The way reads are
done in RxTCP, it could work ... but I see from at least the Linux
sendfile() manpage that the reader cannot be a socket, so that takes that
off the table. Apache has a much simpler problem; they're not trying
to have a virtualized multichannel stream protocol over TCP.
I see that Solaris has sendfilev(), and one of the items it can take is
a userspace buffer, so that could address the sending issue. But
it's not clear to me that the Solaris sendfilev() avoids userspace
copies, since it's a library function and not a system call.
- If you want to do a checksum of the bulk data, you need to read the
bulk data it into memory ... and you lose the benefit of sendfile().
--Ken