[OpenAFS-devel] read performance 1.5.74 Linux

Tue, 18 May 2010 12:25:25 -0400

--On Tuesday, May 18, 2010 12:03:46 PM -0400 Jeffrey Altman 
<jaltman@secure-endpoints.com> wrote:

> On 5/18/2010 11:16 AM, Jeffrey Hutzelman wrote:
>> I'm concerned here that this might mean you are lying about window sizes.
>
> I (personally) am not lying about anything.  I really wish that *you*
> could make a distinction between *me* and the open source code when
> making comments.

Apologies.  Of course I meant that the rx implementation was lying.
And apparently I also misinterpreted what you wrote, because it sounded to 
me like you were describing changes you and Derrick made which resulted in 
dropping received in-window data.

>> The reason some data buffers are allocated in advance is because you
>> must be prepared to receive any data that can be in flight according to
>> the advertised window size _without blocking_ or at least without
>> blocking in a way that prevents traffic in another stream from being
>> received and processed.
>
> A packet is made up of multiple data buffers which are themselves
> packets.  The window size in Rx is not measured in bytes.  It is
> measured in packets and we have no idea how large the incoming packet
> might be.  It can be as large as RX_MAX_PACKET_SIZE.  As such, before
> any receive operation is performed the library must ensure that the full
> number of data buffers has been attached to the packet.

Well, it has to have someplace to put the received UDP datagram.  That 
doesn't necessarily mean "attached to the packet", which I gather is the 
part that created the concurrency problem you saw in July 2009.  But doing 
it some other way isn't necessarily easier, since you still need to account 
for the total number of buffers that must be available to meet window 
committments.

>> Tearing apart other packets to reclaim buffers
>> is acceptable, but not if it means you need to wait for a buffer to
>> drain before you can receive more packets.
>
> It is acceptable but it is also extremely inefficient because most
> packets are not jumbo packets and as such only require a single data
> buffer beyond the buffer used for the header.

Yes.

>>  Dropping received data on
>> the floor when it was received within the advertised window is _not_
>> acceptable; that breaks flow control and exacerbates congestion.
>
> Of course it is but I believe that the early developers made a wise
> choice being causing a kernel panic and being inefficient on the wire.
> If you have to choose one, drop the data on the floor and let it be
> retransmitted.

Well, sure.  But the "right" way to be inefficient here is to advertise a 
smaller window, so that you don't get data that has to be retransmitted. 
Nonetheless, this is a decision that was made ages ago, and now that I 
realize that, there's not much benefit in debating its merits.

> The goal is to ensure that we never get into this case which is why if
> the rxi_NeedMorePackets global variable is TRUE we must actually go and
> allocate more packets the next time it is safe to do so.
>
> The patch that was committed today does that for the first time in the
> history of Rx.

Now I'm really going to have to go back and reread things, because I 
examined this fairly closely a couple of months ago when I was working out 
fileserver tuning, and one of the conclusions I came to at the time was 
that at least in user mode, Rx would always allocate more packets when 
needed, so setting the fileserver's -rxpck parameter should never be 
necessary.

-- Jeff