OpenAFS Master Repository branch, master, updated. openafs-devel-1_5_76-4549-gd03e3c3
Gerrit Code Review
Wed, 14 Jan 2015 10:29:29 -0500
The following commit has been merged in the master branch:
Author: Andrew Deason <email@example.com>
Date: Thu Sep 13 17:58:50 2012 -0500
rx: Normalize use of some MTU-discovery fields
When we store MTUs (peer->ifMTU, peer->natMTU, etc.), we store the
maximum transport unit usable by RX, i.e., excluding the IP and UDP
headers, but including the RX header. Contrariwise, when we track the
size of packets we've sent (conn->lastPacketSize, peer->maxPacketSize),
we track logical packet lengths which exclude the RX header (and the IP
and UDP headers). However, the consumers of lastPacketSize and
maxPacketSize were not always interpreting their values correctly as
excluding the RX (and other) headers.
Add comments to these fields in their respective structure definitions
to help make clear what they contain (and the difference between them).
Correct several checks which were using the wrong constant for
correcting between these two worldviews (and the wrong sign). Modernize
the style of lines that are touched.
The lastPacketSize and maxPacketSize variables are only accessed from
five places: while sending packets, while processing acks, while sending
acks, while handling growMTU events, and in rxi_CheckCall(). They are
used to track the size of packets that have been successfully sent (and
thus, indirectly, to control the size of packets we send). The
maxPacketSize is only set once we have received an ack from the peer for
a packet of that size, and lastPacketSize tracks the size of a
speculative packet larger than that size, until it is acked.
When sending packets, we check if the size of the packet we are sending
exceeds the recorded maxPacketSize, and if so, record that speculative
size in the lastPacketSize variable, along with the sequence number of
that packet in lastPacketSizeSeq.
Correspondingly, when processing acks, if the packet tracked in
lastPacketSizeSeq is being acked, we know that our speculative large
packet was successfully received, and can attempt to update the recorded
maxPacketSize for the peer. This is done through an intermediate
variable, 'pktsize', which holds either the value of lastPacketSize or
lastPingSize, without adjustment for the size of any headers.
The ack processing has a bit of code to handle the case where
maxPacketSize has been reset to zero, initializing it to a small value
which should be expected to always work. The intention seems to have
been to initialize maxPacketSize to the smallest permitted value (since
RX_MIN_PACKET_SIZE is amount of data available to RX in the smallest
permitted IP packet), but the old code was actually initializing
maxPacketSize from zero to something a bit larger than the minimum, by
RX_IPUDP_SIZE + RX_HEADER_SIZE. This over-large initialization was
mostly harmless, see below. After this potential initialization,
'pktsize' was incorrectly used to set a new ifMTU and natMTU for the
peer. It correctly determined that a packet larger than the previous
maxPacketSize had been acked, but then set the peer's ifMTU and natMTU
to smaller values than the acked packet actually indicates. (It is
careful to only increase the ifMTU, though.) The actual peer->MTU is
*not* updated here, but will presumably trickle through eventually via
rxi_Resend() or similar. It is possible that this code should be using
rxi_SetPeerMtu() or similar logic, but that might involve locking issues
which are outside the scope of this analysis.
The over-large initialization of maxPacketSize (from zero) was
fortuitously mostly harmless on systems using minimum-sized IP packets,
since a correspondingly wrong check was used to detect if a new MTU
invalidates this maxPacketSize, with the constants offsetting.
Likewise, the checks in rxi_SendAck() had the wrong constants, but they
offset to produce the correct boundary between small and large packets
while trying to grow the MTU. Unfortunately, the old behavior in the
"small" case is not correct, and the grow MTU event would try to send a
packet with more padding than was intended. In networks allowing
packets slightly larger than the minimum (but not much larger than the
minimum), the old code may have been unable to discover the true MTU.
In the main (MTU-related) logic of rxi_SendAck, a variable 'padbytes' is
set to a small increment of maxPacketSize in the "small" case, and a
larger increment of maxMTU in the "large" case. There is a floor for
padbytes based on RX_MIN_PACKET_SIZE, which ended up being larger than
intended in the old code by approximately the size of the rx header.
(Some of the adjustments performed are rather opaque, so the motivations
The more interesting places where accesses to lastPacketSize and
maxPacketSize happen are during the MTU grow events and in
In rxi_CheckCall(), the packet size variables are only accessed if
the connection has the msgsizeRetryErr flag set, the call is not timed
out (whether for idleness or during active waiting), and the call has
actually received data. In this case, we conclude that sending packets
failed and decrease the MTU. The old code was quite broken in this
regard, with a reversed sense of conditional for whether a speculative
large packet was outstanding, and a rather large decrease in MTU size
of effectively 128 + RX_HEADER_SIZE + RX_IPUDP_SIZE = 212, when only
a decrease of 128 was intended. The new code corrects the sense of
the conditional and sets the decrease in MTU to the intended value of 128.
With respect to MTU grow events, this change only touches
rxi_SetPeerMtu(), to correct the conditional on whether the MTU update
invalidates/overrides the cached maxPacketSize. There is a window of
values which could cause the old code to incorrectly fail to invalidate
the cached maxPacketSize. Values in this window could result in the old
code being stuck on a bad MTU guess, but should not cause an actual
failure to communicate. That conditional zeroing of maxPacketSize is
the only access to the PacketSize variables in rxi_SetPeerMtu().
maxPacketSize is also checked in rxi_GrowMTUEvent(), but only against
zero to avoid sending RX_ACK_MTU packets needlessly, so it is unaffected
by the issue at hand.
In summary, in certain network conditions, the old code could fail
to find an optimum MTU size, but would always continue to operate.
The new code is more likely to find a better MTU size, and the old
and the new code should interoperate fine with both each other and
[firstname.lastname@example.org add a few missed cases; expound on analysis in commit message]
Reviewed-by: Jeffrey Altman <email@example.com>
Tested-by: BuildBot <firstname.lastname@example.org>
Reviewed-by: Daria Brashear <email@example.com>
src/rx/rx.c | 23 ++++++++++++-----------
src/rx/rx_conn.h | 4 ++--
src/rx/rx_peer.h | 6 +++---
3 files changed, 17 insertions(+), 16 deletions(-)
OpenAFS Master Repository