[OpenAFS-devel] 50 second fetch-data?

Jim Rees rees@umich.edu
Thu, 06 Oct 2005 12:16:46 -0400


Your delayed acks are not the problem here.  Notice that your call 49
completed in 9 msec.

The real problem shows up at least twice.  At the very beginning of your
trace you can see the trading acks behavior that is characteristic of this
problem.  The trace is missing the call that started the delay, and I'm
guessing you didn't start running tcpdump until after you noticed a delay.
The reply shows up twice, in frames 7 and 9.

You can see the problem more clearly the next time it happens, starting in
frame 196.  Notice the client and server trading acks about every ten
seconds, at t=3805, 3813, 3820, 3830, 3840, 3850.  Then in frame 210 at
t=3856 the file server sends the reply.

The rx behavior seems proper.  The server is stalling the client until it
can formulate a reply.  The question is, what is the file server doing
between the request at t=3805 and the reply at t=3856?

Having looked at several of these traces, I notice the following common
attributes:

- Happens the first time the client tries to talk to a server after a period
of inactivity.

- Usually happens on the same server each time, or at least the same server
several times in a row.

- Delay time in seconds seems to be fixed for a particular server, but
varies from server to server, anywhere from 14 to 90 seconds.

To sum up, here are the symptoms:

- client sends a request to a file server it hasn't talked to in a while
- client and server trade acks every ten seconds
- server replies after a delay of 14-90 seconds

If you are seeing these symptoms, please send me tcpdump traces.