[OpenAFS-devel] The "50 second fetch-data"-bug?

Niklas Edmundsson Niklas.Edmundsson@hpc2n.umu.se
Tue, 11 Oct 2005 12:44:34 +0200 (MEST)


On Mon, 10 Oct 2005, Jim Rees wrote:

>  Does this seem like the same bug as the thread "50 second fetch-data"
>  a few days ago?
>
> I don't think so.  The 100% cpu usage on the client indicates something
> else, maybe an rx bug.  A tcpdump around the time of your stall might be
> useful.

In /afs/hpc2n.umu.se/home/n/nikke/Public/tmp/afs-stall:
afsprob.cap4 : Capture written by tcpdump -s 1500
afsprob.cap4.txt : Start/end-timestamps of stall and other misc info.

An interesting observation is that the chunksize indeed matters, I get 
identical behaviour with the CVS version if I use the same chunksize 
(8k) as 1.4.0RC does by default. With the new default (64k for 128MB 
memcache) the stalls are less frequent and not as long-lived, but they 
do still occur.

This capture is from my AIX SMP machine, the Linux UP machine freezes 
up completely during the stalls so the capture is no good.

If information is missing or doesn't make sense, just poke at me and 
I'll see what I can do :).

/Nikke
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se     |    nikke@hpc2n.umu.se
---------------------------------------------------------------------------
  I didn't do it nobody saw me you can't prove anything
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=