[OpenAFS] Heavy performance loss on gigabit ethernet
Enrico Scholz
enrico.scholz@informatik.tu-chemnitz.de
Wed, 11 Aug 2004 17:42:09 +0200
Hello,
we are using OpenAFS 1.2.11 in an environment where the fileserver has a
1000Mb/s ethernet interface and the clients 100Mb/s ones. With this
setup we get really poor client-performance on large files; e.g. a 40 MB
sized file needs nearly 4 minutes for the transfer.
| $ time cat kernel-source-2.4.22-1.2197.nptl.i386.rpm >/dev/null
| real 3m53.889s
On another 1000Mb/s enabled machine or the fileserver itself, I get the
full speed
| $ time cat kernel-source-2.4.22-1.2197.nptl.i386.rpm >/dev/null
| real 0m1.226s
When enforcing 100Mb/s on the fileserver (ethtool -s eth0 speed 100
autoneg off), the speed on the clients is ok:
| $ time cat kernel-source-2.4.22-1.2197.nptl.i386.rpm >/dev/null
| real 0m12.928s
I can explain this with the dropping of AFS (UDP) packets in the
involved network-components when the files are larger than the buffer
in the switches. As these packages never reach the client, the server
will have to resend them after a timeout.
Is this a general AFS problem and should I enforce the 100Mb/s by default?
Or, are there hidden switches in OpenAFS which enable something like TCP's
sliding window algorithm? Or, does it work in other environments without
problems and something is wrong with my network?
Some more details about the infrastructure: The fileserver is running
with an e1000 NIC which is connected to a 3com 4900 Gigabit switch (L3
enabled). On this switch, further 3com 43xx 100Mb/s switches are attached
(upstream-ports are gigabit there also) on which the 100Mb/s clients are
connected. Clients and server are running with a Fedora Core 1 kernel and
OpenAFS 1.2.11.
Enrico