[OpenAFS-devel] delays and lost contact with fileserver with 1.3.84 and higher

Sat, 29 Oct 2005 17:43:51 +0200 (MEST)

> I've seen those hangs with 1.3.84, 1.3.85, 1.4.0rc1 and rc5 clients on
> Linux (Kernel 2.6). 1.3.80 and 1.3.82 work fine, so I expect that some
> change between 1.3.82 and 1.3.84 causes the problems.=20

I have looked at the diff between 82 and 84, and there is major
changes in rx which are a bit to big for me to get hold on (lots of
queues here and there). I have not found a way to get a grip on all
the queues and connection flags that are used in rx.

> The fileserver is
> from transarc:
> # rxdebug c-hoernchen -version
> Trying 137.208.3.48 (port 7000):
> AFS version: Base configuration afs3.4 5.77

That is not very - uhm - recent.

(c-hoernchen: I was not aware that there were other related chipmunks
beyond Chip and Dale [A-H=F6rnchen und B-H=F6rnchen] [piff och puff] :-)

> To track down the problem, I've captured the network traffic between
> client and server while creating 10 files with 100k each.=20

Was the capture done on the client or the server?

I've looked and looked and now my eyes are crossed. I have found some
things:

1.

openafs-1.3.84-slow.pcap frame 4 has a fetch data with Length 999999999.

2.=20

openafs-1.3.82-fast frame 75-78 is a store data for file f_5. This
seems to be part of call 5474 spanning 2 IP packets with 4 fragments
each. This shows how it should look.

openafs-1.3.84-slow call 256 frame 84-85 is the corresponding one. But
where are fragments 3 and 4? They should be in the following frames
within milliseconds. Then call 256 stalls completely for a long time
until it is finnished in frame 96. I suspect major fishyness in the
code that assembles and resends rx packets.=20

I'd like to hear more about the changes to rx that were made between
82 and 84, what was the intended outcome?

> I've also noticed that in versions 1.3.80 and 1.3.82 (those that do not
> show the delays) each store-data UDP-packet is 5700 bytes and is
> splitted in four UDP fragments. However, this is also true for 1.3.84,
> which already shows the problems. In 1.4.0rcX, the store-data packets
> seem to be smaller, the UDP packet is only 2896 bytes and comes in two
> fragments. Is there any specific reason why all those packets are large=
r
> than the MTU?

I don't know anything about the change to 1.4.0rcX, but the 4 fragments=20
are a "feature" of rx. Has something changed how rx fragments are
handled in 1.4.0?

There are 2 ways in which rx tries to reduce overhead. It may or may
not be effective.

1. It puts more than one rx packet into an IP packet. I think that's
called jumboframe. I think that feature is handshaked between client
and server and as all my servers have -nojumbo I don't get such
packets.

2. It gerenates IP packets up to 4 times MTU, according to a
RX_MAX_FRAG in src/rx/rx_globals.h. I usually (when I don't forget it)
patch that to 1. I think this comes from the times of the Sun SS10 or
earlier when it was faster to send ONE IP packet with FOUR fragments
instead of FOUR unfragmented IP packets. IMHO (*) this is bull today
as your throuhput is devastated if you combine this scheme with packet
loss. A packet loss of say 10% is multiplied to at least 40% because
of all resends and resends of resends. Todays computers are way faster
in making IP packets than a SS10.

> Any help would be greatly appreciated!

Sorry I can't help more.

Harald.

(*) Not necessary so humble at all times ;-)