[OpenAFS-devel] .35 sec rx delay bug?
Niklas Edmundsson
Niklas.Edmundsson@hpc2n.umu.se
Fri, 10 Nov 2006 13:31:19 +0100 (MET)
On Thu, 9 Nov 2006, Ken Hornstein wrote:
>> When we're at the subject of already tuned solutions: What
>> possibilities are there to combine this with the use of sendfile()?
>> Experiences from apache and other projects show that there are very
>> noticeable effects even when doing sendfile() on small chunks compared
>> to the classic read/write-approach.
>
> Well ... I am certainly willing to investigate it. The problems I see are
> threefold:
>
> - You really want to make sure the header and bulk data end up in one
> TCP frame. If you utilize sendfile(), it isn't possible to guarantee
> that because you'll have to do two seperate operations: one write() to
> do the header data, then the sendfile() call to move the bulk data (right
> now writev() is used so header data and bulk data get coalesced into one
> TCP frame). If you have a series of small TCP frames interspersed with
> large frames, performance will go into the crapper. The way reads are
> done in RxTCP, it could work ... but I see from at least the Linux
> sendfile() manpage that the reader cannot be a socket, so that takes that
> off the table. Apache has a much simpler problem; they're not trying
> to have a virtualized multichannel stream protocol over TCP.
Yeah, sendfile() focuses on the sending-issue.
> I see that Solaris has sendfilev(), and one of the items it can take is
> a userspace buffer, so that could address the sending issue. But
> it's not clear to me that the Solaris sendfilev() avoids userspace
> copies, since it's a library function and not a system call.
AIX sendfile can also do this. In the end I guess you'll want to do
some sort of portability layer, or let the #ifdefs eat your code.
> - If you want to do a checksum of the bulk data, you need to read the
> bulk data it into memory ... and you lose the benefit of sendfile().
Isn't the TCP checksumming enough? Anyhow, encryption would also have
this effect.
In any case, I was just curious about it being possible at all. Modern
servers shouldn't have any problems delivering gige-speed without
sendfile given sane code, it will be very interesting to see what
happens when 10gige gets common though. A wild guess is that we'll be
limited by disk speed.
/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke@hpc2n.umu.se
---------------------------------------------------------------------------
"If the Apocalypse comes, beep me"- Buffy
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=