[OpenAFS-devel] .35 sec rx delay bug?

Jim Rees rees@umich.edu
Thu, 02 Nov 2006 13:48:29 -0500


I've got a set of file servers here that are giving extremely poor
performance on fetchdata, less than 1% of what they should.  Tcpdump traces
show frequent delays of 350 msec.  It looks like an rx bug.

Rainer Toebbicke mentioned something like this a while back but I can't find
any further references to it.

Some things we have tried, with no change in behavior:

1.4.1 built from source
1.4.2 built from source
1.4.2 from binary rpm
lwp vs pthread server
32 vs 64 bit file server
many different clients

I would suspect hardware but we get this on at least three different
servers.

The server runs Scientific Linux 4.4 which is basically recompiled RHEL4
Update 4, and kernel 2.6.9-42.0.3.EL.cernsmp on some kind of i386 platform.
There is a trace at /afs/umich.edu/user/r/e/rees/pub/x.trc .  Any help would
be appreciated.  I can supply the name of a test file on request.