[OpenAFS] Re: RPC service unavailable, windows client, udebug works

Christian chanlists@googlemail.com
Wed, 05 Nov 2014 22:10:59 +0100


On Tue, 04 Nov 2014 16:05:24 +0100 Christian <chanlists@googlemail.com>
wrote:
>> on some of our windows clients (win7 enterprise x64, openafs 1.7.31), we
>> are seeing issues where if I try to access a volume on a given server,
>> it gives me "RPC service unavailable". This only happens for one of our
>> two file and db servers, which are both almost identical (the first one
>> has in fact been cloned from the second one). Servers run openafs
>> 1.6.9-1~bpo7 from wheezy-backports on debian wheezy. While that is
>> happening, "fs checkservers" reports that particular server as being
>> down.
> Does syslog report the server coming back up later, if you don't try to
> access anything?
Sometimes. I can sometimes also "fix" it by completely uninstalling the
AFS client and reinstalling it.
>> udebug <server> 7003 works, though, and I can ping that server or
>> ssh to it just fine. Should I post trace logs and udebug output for
>> people to look at, or what is the appropriate way to debug this? Thanks
>> a lot,
> It's much more likely that you're failing to contact the fileserver
> (port 7000), not the vlserver (port 7003). You can check basic
> connectivity for that with 'rxdebug <server> 7000 -version'.
>
> But that will probably just succeed and won't tell you anything. What
> would really tell you what's happening is if you could capture AFS
> traffic (udp port 7000) close to the client, and close to the server (at
> least, 'before' and 'after' the openvpn link). If Jeff's suggestion is
> what is happening, you'll see packets that appear to be sent on the
> server side, but will not appear on the client side. Specifically, you'd
> see packets over a certain size not appear on the client side.
>
> You can either look at the dump yourself in wireshark or something, or
> provide it for one of us to look at. But you don't really need to know
> anything about AFS to do the above analysis; just see if larger packets
> appear in one dump but not the other.
>
> If you determine that what Jeff mentioned is what's happening, and you
> can't fix or alter the thing that's dropping packets, you might be able
> to change a setting in the Windows client to reduce the max size of
> packets that we use (RxMaxMTU). Or change the MTU on the local
> interface; I don't recall what the specifics are of changing this on
> Windows.
OK, so udebug 7000 130.75.103.223 fails on that machine. But it also
fails for our other server which I can access via the afs client just
fine. So I did this:

(on the file server, 130.75.103.223)
tcpdump -n host 130.75.103.223 and host 130.75.102.221 and udp
22:00:42.166283 IP 130.75.102.221.55607 > 130.75.103.223.7000:  rx data
fs call op#10006 (32)
22:00:42.166401 IP 130.75.103.223.7000 > 130.75.102.221.55607:  rx abort
(32)
22:00:42.169060 IP 130.75.102.221.55607 > 130.75.103.223.7000:  rx data
fs call op#10004 (32)
22:00:42.169157 IP 130.75.103.223.7000 > 130.75.102.221.55607:  rx abort
(32)

(on the client, 130.75.102.221)
windump.exe -n -i blah host 130.75.103.223 and host 130.75.102.221 and udp
22:00:42.166283 IP 130.75.102.221.55607 > 130.75.103.223.7000:  rx data
fs call op#10006 (32)
22:00:42.166401 IP 130.75.103.223.7000 > 130.75.102.221.55607:  rx abort
(32)
22:00:42.169060 IP 130.75.102.221.55607 > 130.75.103.223.7000:  rx data
fs call op#10004 (32)
22:00:42.169157 IP 130.75.103.223.7000 > 130.75.102.221.55607:  rx abort
(32)

This is the result of
udebug 130.75.103.223 7000
on the client, which fails with
"return code -2 from VOTE_debug"

Bizarre. I cannot see much of a difference... Thanks for looking into this,

Christian