[OpenAFS-devel] Vos dump "performance"

Simon Wilkinson simonxwilkinson@gmail.com
Tue, 19 May 2015 17:53:27 +0300


--Apple-Mail-7F08D45B-2415-4B36-B2CE-975CAD8C353B
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable


>> On 19 May 2015, at 16:27, Jeffrey Altman <jaltman@your-file-system.com> w=
rote:
>>=20
>>> On 5/19/2015 8:54 AM, Harald Barth wrote:
>>>=20
>>> 1444 is the number of octets after subtracting the ip/ip6 and udp/udp6
>>> headers for a network with MTU of 1500.
>>=20
>> Yes but here I was on localhost and that has a loopback does have an
>> MTU of 16436. If the MTU detection code does the right thing(TM).
>=20
> Unlike IPv6 when using IPv4 there is no reliable path mtu detection for
> UDP.

Things are a little bit more complex than this. At the protocol level, UDP c=
an use the same path MTU discovery mechanisms as TCP does. It's just that mo=
st operating systems make it very hard to do so. Linux is the notable except=
ion here - you can query the path MTU for a given endpoint, and be notified i=
f and when that MTU changes. Probing is left up to the application.

>  Nor is there a reliable method of delivering fragmented packets.

In theory fragmented packets are delivered. In practice, they often aren't. H=
owever, the fundamental difference between v4 and v6 is that with v4 any rou=
ter may fragment a packet. With v6, only the sender may perform fragmentatio=
n. With both v4 and v6 a router may end up dropping fragments, sadly.

> Rx doesn't even know which interface the packet is going to be sent
> over. =20

This isn't universally true. Linux provides a mechanism of determining this,=
 as do most BSD derived stacks - although it often involves querying the mac=
hine's routing tables, and so is decidedly non-portable.

> An Rx packet is never larger than 1500 - Header Sizes. =20

There is support in the code for packets larger than 1500. They're put toget=
her by joining multiple buffers together, but are not jumbo grams. The issue=
 with these is that RX can only fragment its input stream once - after a pac=
ket has been sent with a given amount of data, there is no opportunity to sp=
lit it if it is too big for the path. So, if your MTU drops, your only choic=
e is to abort the call and begin again.

We never usually see packets of this size because RX has a few places in whi=
ch the MTU is clamped to 1500, and to the lowest MTU listed for the machines=
 interfaces.

S.=

--Apple-Mail-7F08D45B-2415-4B36-B2CE-975CAD8C353B
Content-Type: text/html;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable

<html><head><meta http-equiv=3D"content-type" content=3D"text/html; charset=3D=
utf-8"></head><body dir=3D"auto"><br><div><div><div><font color=3D"#000000">=
<span style=3D"background-color: rgba(255, 255, 255, 0);">On 19 May 2015, at=
 16:27, Jeffrey Altman &lt;<a href=3D"mailto:jaltman@your-file-system.com">j=
altman@your-file-system.com</a>&gt; wrote:<br><br></span></font></div><block=
quote type=3D"cite"><font color=3D"#000000"><span style=3D"background-color:=
 rgba(255, 255, 255, 0);">On 5/19/2015 8:54 AM, Harald Barth wrote:<br></spa=
n></font><blockquote type=3D"cite"><font color=3D"#000000"><span style=3D"ba=
ckground-color: rgba(255, 255, 255, 0);"><br></span></font></blockquote><blo=
ckquote type=3D"cite"><blockquote type=3D"cite"><font color=3D"#000000"><spa=
n style=3D"background-color: rgba(255, 255, 255, 0);">1444 is the number of o=
ctets after subtracting the ip/ip6 and udp/udp6<br></span></font></blockquot=
e></blockquote><blockquote type=3D"cite"><blockquote type=3D"cite"><font col=
or=3D"#000000"><span style=3D"background-color: rgba(255, 255, 255, 0);">hea=
ders for a network with MTU of 1500.</span></font></blockquote></blockquote>=
<blockquote type=3D"cite"><font color=3D"#000000"><span style=3D"background-=
color: rgba(255, 255, 255, 0);"><br></span></font></blockquote><blockquote t=
ype=3D"cite"><font color=3D"#000000"><span style=3D"background-color: rgba(2=
55, 255, 255, 0);">Yes but here I was on localhost and that has a loopback d=
oes have an<br></span></font></blockquote><blockquote type=3D"cite"><font co=
lor=3D"#000000"><span style=3D"background-color: rgba(255, 255, 255, 0);">MT=
U of 16436. If the MTU detection code does the right thing(TM).</span></font=
></blockquote><font color=3D"#000000"><span style=3D"background-color: rgba(=
255, 255, 255, 0);"><br>Unlike IPv6 when using IPv4 there is no reliable pat=
h mtu detection for<br>UDP.</span></font></blockquote><div><span style=3D"ba=
ckground-color: rgba(255, 255, 255, 0);"><br></span></div><div><span style=3D=
"background-color: rgba(255, 255, 255, 0);">Things are a little bit more com=
plex than this. At the protocol level, UDP can use the same path MTU discove=
ry mechanisms as TCP does. It's just that most operating systems make it ver=
y hard to do so. Linux is the notable exception here - you can query the pat=
h MTU for a given endpoint, and be notified if and when that MTU changes. Pr=
obing is left up to the application.</span></div><div><span style=3D"backgro=
und-color: rgba(255, 255, 255, 0);"><br></span></div><blockquote type=3D"cit=
e"><font color=3D"#000000"><span style=3D"background-color: rgba(255, 255, 2=
55, 0);">&nbsp;Nor is there a reliable method of delivering fragmented packe=
ts.</span></font></blockquote><div><span style=3D"background-color: rgba(255=
, 255, 255, 0);"><br></span></div><span style=3D"background-color: rgba(255,=
 255, 255, 0);">In theory fragmented packets are delivered. In practice, the=
y often aren't. However, the fundamental difference between v4 and v6 is tha=
t with v4 any router may fragment a packet. With v6, only the sender may per=
form fragmentation. With both v4 and v6 a router may end up dropping fragmen=
ts, sadly.</span><div><span style=3D"background-color: rgba(255, 255, 255, 0=
);"><br></span><blockquote type=3D"cite"><font color=3D"#000000"><span style=
=3D"background-color: rgba(255, 255, 255, 0);">Rx doesn't even know which in=
terface the packet is going to be sent<br>over. &nbsp;</span></font></blockq=
uote><div><span style=3D"background-color: rgba(255, 255, 255, 0);"><br></sp=
an></div><span style=3D"background-color: rgba(255, 255, 255, 0);">This isn'=
t universally true. Linux provides a mechanism of determining this, as do mo=
st BSD derived stacks - although it often involves querying the machine's ro=
uting tables, and so is decidedly non-portable.</span><div><span style=3D"ba=
ckground-color: rgba(255, 255, 255, 0);"><br></span></div><div><blockquote t=
ype=3D"cite"><font color=3D"#000000"><span style=3D"background-color: rgba(2=
55, 255, 255, 0);">An Rx packet is never larger than 1500 - Header Sizes. &n=
bsp;</span></font></blockquote><div><span style=3D"background-color: rgba(25=
5, 255, 255, 0);"><br></span></div><span style=3D"background-color: rgba(255=
, 255, 255, 0);">There is support in the code for packets larger than 1500. T=
hey're put together by joining multiple buffers together, but are not jumbo g=
rams. The issue with these is that RX can only fragment its input stream onc=
e - after a packet has been sent with a given amount of data, there is no op=
portunity to split it if it is too big for the path. So, if your MTU drops, y=
our only choice is to abort the call and begin again.</span></div><div><span=
 style=3D"background-color: rgba(255, 255, 255, 0);"><br></span></div><div><=
span style=3D"background-color: rgba(255, 255, 255, 0);">We never usually se=
e packets of this size because RX has a few places in which the MTU is clamp=
ed to 1500, and to the lowest MTU listed for the machines interfaces.</span>=
</div></div></div><div><span style=3D"background-color: rgba(255, 255, 255, 0=
);"><br></span></div><div><span style=3D"background-color: rgba(255, 255, 25=
5, 0);">S.</span></div></div></body></html>=

--Apple-Mail-7F08D45B-2415-4B36-B2CE-975CAD8C353B--