[OpenAFS-devel] Andrew Deason's OpenAFS RX performance patches

John P Janosik jpjanosi@us.ibm.com
Sun, 9 May 2021 19:46:06 -0500


--=_alternative 000437CC862586D1_=
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="US-ASCII"

Jeffrey E Altman <jaltman@auristor.com> wrote on 05/07/2021 04:44:24 PM:
> John,
>=20
> What are your observations of how this code behaves on congested links.=20
>   I expect that the sorting of received packets distorts the ACK clock=20
> and packet skew measurements reducing the ability to accurately measure=20
> the congestion window.  Processing ACK packets in bulk is likely to=20
> produce a bursty transmission pattern which can result in overflowing=20
> the link capacity.  As a result, fairness is reduced and packet loss=20
> might be increased.
>=20
> Jeffrey Altman
> AuriStor, Inc.
>=20

With our server hardware and the grid environment used in testing I was=20
never able to get over about 7Gb/s out of the 10Gb/s connection to the=20
servers and wasn't seeing any packet loss/rx retransmits.  I know Andrew=20
reported more than that was possible in the presentation regarding these=20
patches but I didn't have time to debug why our setup wasn't matching=20
those results.  My impression was that some other sites might be running=20
these patches in production.  Can anyone comment if that is the case and=20
if they are able to saturate links and have the problem described?


> On 5/6/2021 10:22 PM, John P Janosik (jpjanosi@us.ibm.com) wrote:
> > Hi Ben,
> >=20
> > We have been importing these patches into our IBM internal OpenAFS=20
1.8.X=20
> > builds for over a year and have had our busiest cells running these=20
> > versions since fall last year.  We hit some deadlock issue early on=20
but=20
> > that was fixed and I believe those patches made it to gerrit as well.
> >=20
> > I did the work to get the patches to apply to the versions of OpenAFS=20
we=20
> > are running, but I don't feel confident calling it a review.  I missed =


> > the deadlock issue until we actually put it into production :).
> >=20
> > John Janosik
> > jpjanosi@us.ibm.com
>=20
> [attachment "jaltman.vcf" deleted by John P Janosik/Rochester/IBM]=20
> [attachment "OpenPGP=5Fsignature" deleted by John P Janosik/Rochester/IBM=
]=20



--=_alternative 000437CC862586D1_=
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset="US-ASCII"

<tt><span style=3D" font-size:10pt">Jeffrey E Altman &lt;jaltman@auristor.c=
om&gt;
wrote on 05/07/2021 04:44:24 PM:<br>&gt; John,<br>&gt; <br>&gt; What are yo=
ur observations of how this code behaves on congested links.
<br>&gt; &nbsp; I expect that the sorting of received packets distorts the
ACK clock <br>&gt; and packet skew measurements reducing the ability to acc=
urately measure
<br>&gt; the congestion window. &nbsp;Processing ACK packets in bulk is lik=
ely
to <br>&gt; produce a bursty transmission pattern which can result in overf=
lowing
<br>&gt; the link capacity. &nbsp;As a result, fairness is reduced and pack=
et
loss <br>&gt; might be increased.<br>&gt; <br>&gt; Jeffrey Altman<br>&gt; A=
uriStor, Inc.<br>&gt; </span></tt><br><br><span style=3D" font-size:10pt;fo=
nt-family:sans-serif">With our server
hardware and the grid environment used in testing I was never able to get
over about 7Gb/s out of the 10Gb/s connection to the servers and wasn't
seeing any packet loss/rx retransmits. &nbsp;I know Andrew reported more
than that was possible in the presentation regarding these patches but
I didn't have time to debug why our setup wasn't matching those results.
&nbsp;My impression was that some other sites might be running these patches
in production. &nbsp;Can anyone comment if that is the case and if they
are able to saturate links and have the problem described?</span><br><br><t=
t><span style=3D" font-size:10pt"><br>&gt; On 5/6/2021 10:22 PM, John P Jan=
osik (jpjanosi@us.ibm.com) wrote:<br>&gt; &gt; Hi Ben,<br>&gt; &gt; <br>&gt=
; &gt; We have been importing these patches into our IBM internal OpenAFS
1.8.X <br>&gt; &gt; builds for over a year and have had our busiest cells r=
unning
these <br>&gt; &gt; versions since fall last year. &nbsp;We hit some deadlo=
ck issue
early on but <br>&gt; &gt; that was fixed and I believe those patches made =
it to gerrit
as well.<br>&gt; &gt; <br>&gt; &gt; I did the work to get the patches to ap=
ply to the versions of
OpenAFS we <br>&gt; &gt; are running, but I don't feel confident calling it=
 a review.
&nbsp;I missed <br>&gt; &gt; the deadlock issue until we actually put it in=
to production :).<br>&gt; &gt; <br>&gt; &gt; John Janosik<br>&gt; &gt; jpja=
nosi@us.ibm.com<br>&gt; <br>&gt; [attachment &quot;jaltman.vcf&quot; delete=
d by John P Janosik/Rochester/IBM]
<br>&gt; [attachment &quot;OpenPGP=5Fsignature&quot; deleted by John P Jano=
sik/Rochester/IBM]
</span></tt><BR>

--=_alternative 000437CC862586D1_=--