[OpenAFS-devel] OpenBSD client bug fix
Nickolai Zeldovich
kolya@MIT.EDU
Wed, 22 Jan 2003 21:21:45 -0500
> By the way, if any of you rx experts want to look at a tcpdump and tell me
> what's wrong, it would be a big help. The client gets stuck in a fetchdata
> at about seq no 145. The server retransmits, but then never resumes sending
> data where it left off, at seq 173. It pings the client a couple times then
> gives up. I don't think the server is at fault.
I looked at the tcpdump data, and came to basically the same conclusion.
The 145-148 jumbogram is lost, the client then receives 149-172 (but again
loses the last jumbogram, 173-176), sends a nack for 145-148, and the server
retransmits it in pieces. Then the client feeds all the packets up to the
application, acks everything up to and including 172, but the server doesn't
resume.
I don't see anything particularly wrong with the client side of it, though.
The server should have retransmitted the 173-176 packets (split up into
individual packets, since jumbograms are never retransmitted) after not
hearing back from the client about it. It doesn't look like 173-176 was
ever outside the transmit or receive windows. It would be interesting to
see rxdebug output for the server, but even more so a core dump of the
fileserver to see what's in the output queue. Is this reproduceable?
-- kolya