[OpenAFS-devel] OpenBSD client bug fix

Nickolai Zeldovich kolya@MIT.EDU
Wed, 22 Jan 2003 21:21:45 -0500


> By the way, if any of you rx experts want to look at a tcpdump and tell me
> what's wrong, it would be a big help.  The client gets stuck in a fetchdata
> at about seq no 145.  The server retransmits, but then never resumes sending
> data where it left off, at seq 173.  It pings the client a couple times then
> gives up.  I don't think the server is at fault.

I looked at the tcpdump data, and came to basically the same conclusion.
The 145-148 jumbogram is lost, the client then receives 149-172 (but again
loses the last jumbogram, 173-176), sends a nack for 145-148, and the server
retransmits it in pieces.  Then the client feeds all the packets up to the
application, acks everything up to and including 172, but the server doesn't
resume.

I don't see anything particularly wrong with the client side of it, though.
The server should have retransmitted the 173-176 packets (split up into
individual packets, since jumbograms are never retransmitted) after not
hearing back from the client about it.  It doesn't look like 173-176 was
ever outside the transmit or receive windows.  It would be interesting to
see rxdebug output for the server, but even more so a core dump of the
fileserver to see what's in the output queue.  Is this reproduceable?

-- kolya