[OpenAFS] Heavy performance loss on gigabit ethernet

Wed, 11 Aug 2004 17:42:09 +0200

Hello,

we are using OpenAFS 1.2.11 in an environment where the fileserver has a
1000Mb/s ethernet interface and the clients 100Mb/s ones. With this
setup we get really poor client-performance on large files; e.g. a 40 MB
sized file needs nearly 4 minutes for the transfer.

| $ time cat kernel-source-2.4.22-1.2197.nptl.i386.rpm >/dev/null 
| real    3m53.889s

On another 1000Mb/s enabled machine or the fileserver itself, I get the
full speed

| $ time cat kernel-source-2.4.22-1.2197.nptl.i386.rpm >/dev/null 
| real    0m1.226s

When enforcing 100Mb/s on the fileserver (ethtool -s eth0 speed 100
autoneg off), the speed on the clients is ok:

| $ time cat kernel-source-2.4.22-1.2197.nptl.i386.rpm >/dev/null 
| real    0m12.928s

I can explain this with the dropping of AFS (UDP) packets in the
involved network-components when the files are larger than the buffer
in the switches. As these packages never reach the client, the server
will have to resend them after a timeout.

Is this a general AFS problem and should I enforce the 100Mb/s by default? 
Or, are there hidden switches in OpenAFS which enable something like TCP's
sliding window algorithm? Or, does it work in other environments without
problems and something is wrong with my network?

Some more details about the infrastructure: The fileserver is running
with an e1000 NIC which is connected to a 3com 4900 Gigabit switch (L3
enabled). On this switch, further 3com 43xx 100Mb/s switches are attached
(upstream-ports are gigabit there also) on which the 100Mb/s clients are
connected. Clients and server are running with a Fedora Core 1 kernel and
OpenAFS 1.2.11.

Enrico