[OpenAFS] Heavy performance loss on gigabit ethernet
Systems Administration
sysadmin@contrailservices.com
Thu, 12 Aug 2004 17:08:32 -0600
> Now, the test between 1000Mb/s and 100Mb/s machines:
> | [ensc@A]$ iperf -c C -b 1000M -ud
> | [ ID] Interval Transfer Bandwidth
> | [ 4] 0.0-10.0 sec 622 MBytes 522 Mbits/sec
> | [ 3] 0.0-10.0 sec 114 MBytes 95.6 Mbits/sec 0.297 ms
> 0/81287 (0%)
> | [ 4] Server Report:
> | [ 4] 0.0-10.2 sec 114 MBytes 93.5 Mbits/sec 15.142 ms
> 362232/443628 (82%)
> | [ 4] Sent 443628 datagrams
>
> This is expected: server A sends with full gigabit-speed and lots of
> UDP packages will be dropped as client is 100Mb/s only. Therefore, the
> network itself seems to be ok.
How much do you lose when you test at the 100Mb speed of the client -
if you cant get 100% at the maximum speed of the client then there
might be an issue there.
> As this test corresponds to the slow AFS performance (fileserver A
> sends
> large file to client C), something must be wrong with AFS.
This may be related to my problem with clients hanging - I had
contemplated this previously however discarded it since the AFS
protocol should recover from a bad or missing UDP packet but Enrico's
question begs the fact - how does the AFS protocol recover when the
pipe from server to client is lossy? Is the client responsible for
recovering - and could a maladjusted network segment that drops a high
percentage of packets be responsible?
I have been trying to figure out how to engage 802.3 flow control on
the segment between my Gigabit backbone and the clients that are
experiencing hangups but I believe that one of my switches is not able
to support back-pressure and as such the server seems to flood over the
bandwidth available causing a critical loss of synchronization between
the endpoints. Similar fubars are occuring with other UDP protocols
which suggest a common cause.
I'll experiment with forcing the network to a unified 100MB speed and
report back - in the mean time can any of the wizards here comment on
whether this is something that could be investigated - and suggestions
on where to look in the debug logs and code would be helpful. This
thread could also be thrown over to the -devel list if you think
appropriate and not a waste of time.
Ted