[OpenAFS-devel] Got one of those interesting "bunch of servers won't talk to a client" situations right at the moment...

Derrick J Brashear shadow@dementia.org
Thu, 17 Mar 2005 21:52:00 -0500 (EST)


On Thu, 17 Mar 2005, Neulinger, Nathan wrote:

> Client and server are both running builds from within the past few
> months.
>
>
> -bash-2.05b# /usr/afsws/bin/fs checks
> These servers unavailable due to network or server problems:
> afs-fs1.cc.umr.edu afs-fs17.cc.umr.edu afs-fs7.cc.umr.edu

I would have suggested it was the bug Tom Keiser sent us a patch for in 
1.3.79 but...

> In a network trace, a few of the servers are sending back rx abort
> packets.

This suggests otherwise. Looking at your tcpdump output I think you might 
have more than one problem, possibly one which is this:
http://www.openafs.org/cgi-bin/wdelta/STABLE14-fix-multirx-checkservers-20050216
(explaining why valid replies are seemingly ignored)

> 14:42:01.771456 afs-fs1.cc.umr.edu.afs3-fileserver >
> sysinst.cc.umr.edu.afs3-callback:  rx abort (32)
> 14:42:01.774793 afs-fs7.cc.umr.edu.afs3-fileserver >
> sysinst.cc.umr.edu.afs3-callback:  rx abort (32)

> afs-fs7.cc.umr.edu.afs3-fileserver:  rx data fs call get-time (32)
> 14:42:04.858423 sysinst.cc.umr.edu.afs3-callback >
> afs-fs1.cc.umr.edu.afs3-fileserver:  rx data fs call get-time (32)
> 14:42:05.321829 afs-fs1.cc.umr.edu.afs3-fileserver >
> sysinst.cc.umr.edu.afs3-callback:  rx abort (32)
> 14:42:05.835009 afs-fs7.cc.umr.edu.afs3-fileserver >
> sysinst.cc.umr.edu.afs3-callback:  rx abort (32)

And this is something else. Can I see raw tcpdump output? I want to look 
more clossely at the aborts.