[OpenAFS-devel] "Lost contact with file server" problems

Derrick J Brashear shadow@dementia.org
Sat, 20 Aug 2005 07:26:59 -0400 (EDT)


On Fri, 19 Aug 2005, Lyle wrote:

> Yes, that is an old bug that used to only happen very rarely so it's
> interesting that it's happening more frequently now.  The connection gets
> put in a permanent error state so that every packet that comes in generates
> an abort, and I think the CM should destroy the connection but doesn't.
> I'm just thinking that since it's multihomed, one of the other interfaces
> should be satisfying the request.

Well, one case which has happened recently appears to be:
1) fileserver sends rx data packet with call number 0 to client
2) client marks rx protocol error and errors to server
3) but keeps sending traffic, which gets an abort.

but, I'm still waiting to get some raw tcpdump from someone to see what 
actually happens.