[OpenAFS] AFS hangs, possible nat issues?

Mark Huijgen mark@nl.simpc.com
Thu, 20 May 2010 15:33:25 +0200


This is a multi-part message in MIME format.
--------------010701040103050906040709
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

On 05/19/2010 06:26 PM, Derrick Brashear wrote:
>> I have patched my 1.4.12 linux client with these 2 patches I found in git:
>>
>> rx lowlevel nat ping
>> http://git.openafs.org/?p=openafs.git;a=commitdiff_plain;h=d24078658d183ea2e72e61c1888e9900bac0ec32
>> rx nat event connection reference
>> http://git.openafs.org/?p=openafs.git;a=commitdiff_plain;h=d9cf88428aa542d1cd304e82f02333eced0194ae
>>
>> Are these 2 patches enough?
>>     
> Yes.

After running for a while with the 2 patches applied, I noticed that
over the day the number of keep alive packets being sent is gradually
increasing.
It starts of with 1 packet every 20s to each server it has been in
communication with. But after some time it looks like more keepalive
packet is added for exactly the same server.

Output of tcpdump -vv running for 20 seconds after the client has been
running for half a day:

14:51:01.152574 IP (tos 0x0, ttl 64, id 270, offset 0, flags [none],
proto UDP (17), length 57) 10.10.10.123.afs3-callback >
afs3.afs3-fileserver: [udp sum ok]  rx version cid 00000000 call# 0 seq
0 ser 0 <last-pckt> (29)
14:51:01.152596 IP (tos 0x0, ttl 64, id 35821, offset 0, flags [none],
proto UDP (17), length 57) 10.10.10.123.afs3-callback >
afs2.afs3-fileserver: [udp sum ok]  rx version cid 00000000 call# 0 seq
0 ser 0 <last-pckt> (29)
14:51:01.152609 IP (tos 0x0, ttl 64, id 3326, offset 0, flags [none],
proto UDP (17), length 57) 10.10.10.123.afs3-callback >
afs5.afs3-fileserver: [udp sum ok]  rx version cid 00000000 call# 0 seq
0 ser 0 <last-pckt> (29)
14:51:01.656564 IP (tos 0x0, ttl 64, id 271, offset 0, flags [none],
proto UDP (17), length 57) 10.10.10.123.afs3-callback >
afs3.afs3-fileserver: [udp sum ok]  rx version cid 00000000 call# 0 seq
0 ser 0 <last-pckt> (29)
14:51:01.656583 IP (tos 0x0, ttl 64, id 35822, offset 0, flags [none],
proto UDP (17), length 57) 10.10.10.123.afs3-callback >
afs2.afs3-fileserver: [udp sum ok]  rx version cid 00000000 call# 0 seq
0 ser 0 <last-pckt> (29)
14:51:01.656594 IP (tos 0x0, ttl 64, id 3327, offset 0, flags [none],
proto UDP (17), length 57) 10.10.10.123.afs3-callback >
afs5.afs3-fileserver: [udp sum ok]  rx version cid 00000000 call# 0 seq
0 ser 0 <last-pckt> (29)
14:51:02.160545 IP (tos 0x0, ttl 64, id 272, offset 0, flags [none],
proto UDP (17), length 57) 10.10.10.123.afs3-callback >
afs3.afs3-fileserver: [udp sum ok]  rx version cid 00000000 call# 0 seq
0 ser 0 <last-pckt> (29)
14:51:02.160562 IP (tos 0x0, ttl 64, id 35823, offset 0, flags [none],
proto UDP (17), length 57) 10.10.10.123.afs3-callback >
afs2.afs3-fileserver: [udp sum ok]  rx version cid 00000000 call# 0 seq
0 ser 0 <last-pckt> (29)
14:51:02.160573 IP (tos 0x0, ttl 64, id 3328, offset 0, flags [none],
proto UDP (17), length 57) 10.10.10.123.afs3-callback >
afs5.afs3-fileserver: [udp sum ok]  rx version cid 00000000 call# 0 seq
0 ser 0 <last-pckt> (29)
14:51:14.256567 IP (tos 0x0, ttl 64, id 44382, offset 0, flags [none],
proto UDP (17), length 57) 10.10.10.123.afs3-callback >
afs4.afs3-fileserver: [udp sum ok]  rx version cid 00000000 call# 0 seq
0 ser 0 <last-pckt> (29)
14:51:14.256587 IP (tos 0x0, ttl 64, id 3329, offset 0, flags [none],
proto UDP (17), length 57) 10.10.10.123.afs3-callback >
afs5.afs3-fileserver: [udp sum ok]  rx version cid 00000000 call# 0 seq
0 ser 0 <last-pckt> (29)
14:51:15.264561 IP (tos 0x0, ttl 64, id 44383, offset 0, flags [none],
proto UDP (17), length 57) 10.10.10.123.afs3-callback >
afs4.afs3-fileserver: [udp sum ok]  rx version cid 00000000 call# 0 seq
0 ser 0 <last-pckt> (29)

So thats 3 keepalives to server afs3 every 20 seconds where it started
off with just 1 every 20 seconds.
Is this expected behaviour that it keeps sending more and more packets
to the same fileserver?

The number of entries returned by running 'rxdebug localhost 7001
-allconnections' on the client seems to grow with the number of packets
sent every 20s to each server (see attachment, ip's replaced with short
hostnames to match tcpdump output).

vlserver pings do seem to stop when the connection to the vlserver is
destroyed, just not the fileserver ones.

Mark Huijgen

--------------010701040103050906040709
Content-Type: text/plain;
 name="rxdebug-natping"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="rxdebug-natping"

Trying 127.0.0.1 (port 7001):
Free packets: 130, packet reclaims: 0, calls: 524, used FDs: 64
not waiting for packets.
0 calls waiting for a thread
1 threads are idle
Connection from host afs5, port 7000, Cuid 839ddfba/66d9c760
  serial 143,  natMTU 1444, flags pktCksum, security index 2, client conn
  rxkad: level crypt, flags pktCksum
  Received 22440 bytes in 70 packets
  Sent 1120 bytes in 62 packets
    call 0: # 61, state dally, mode: receiving, flags: receive_done
    call 1: # 2, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Connection from host afs4, port 7000, Cuid 839ddfba/66d9c764
  serial 750,  natMTU 1444, flags pktCksum, security index 2, client conn
  rxkad: level crypt, flags pktCksum
  Received 132296 bytes in 393 packets
  Sent 10364 bytes in 350 packets
    call 0: # 309, state dally, mode: receiving, flags: receive_done
    call 1: # 41, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Connection from host afs4, port 7000, Cuid 9a20c282/2fcba0d4
  serial 1,  natMTU 1444, security index 0, server conn
    call 0: # 9, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Connection from host afs5, port 7000, Cuid 8d6a18a9/2fcf9968
  serial 2,  natMTU 1444, security index 0, server conn
    call 0: # 11, state dally, mode: eof, flags: receive_done
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Connection from host afs3, port 7000, Cuid 839ddfba/2fd1cd54
  serial 0,  natMTU 1444, security index 0, client conn
    call 0: # 0, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Connection from host afs2, port 7000, Cuid 839ddfba/2fd1cd58
  serial 0,  natMTU 1444, security index 0, client conn
    call 0: # 0, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Connection from host afs5, port 7000, Cuid 839ddfba/2fd1cd5c
  serial 0,  natMTU 1444, security index 0, client conn
    call 0: # 0, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Connection from host afs3, port 7000, Cuid 839ddfba/2fd1cd64
  serial 0,  natMTU 1444, security index 0, client conn
    call 0: # 0, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Connection from host afs2, port 7000, Cuid 839ddfba/2fd1cd68
  serial 0,  natMTU 1444, security index 0, client conn
    call 0: # 0, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Connection from host afs5, port 7000, Cuid 839ddfba/2fd1cd6c
  serial 0,  natMTU 1444, security index 0, client conn
    call 0: # 0, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Connection from host afs4, port 7000, Cuid 839ddfba/2fd1cd70
  serial 73,  natMTU 1444, security index 0, client conn
    call 0: # 26, state dally, mode: receiving, flags: receive_done
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Connection from host afs3, port 7000, Cuid 839ddfba/2fd1cd74
  serial 0,  natMTU 1444, security index 0, client conn
    call 0: # 0, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Connection from host afs2, port 7000, Cuid 839ddfba/2fd1cd78
  serial 0,  natMTU 1444, security index 0, client conn
    call 0: # 0, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Connection from host afs5, port 7000, Cuid 839ddfba/2fd1cd7c
  serial 0,  natMTU 1444, security index 0, client conn
    call 0: # 0, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Connection from host afs2, port 7003, Cuid 839ddfba/2fd1cd80
  serial 2,  natMTU 1444, security index 0, client conn
    call 0: # 1, state dally, mode: receiving, flags: receive_done
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Done.

--------------010701040103050906040709--