[OpenAFS] client hangs through ipsec

Wes Chow wes@woahnelly.net
Wed, 1 Dec 2004 21:58:09 -0500


I have two fileservers (both Debian Sarge, OpenAFS 1.2.11), and
several clients (also Sarge, however OpenAFS 1.2.13).  The local
clients work fine.  Remote clients that are connected through IPsec
(OpenSwan 2.2.0) hang. I've heard of ssh/scp of large files through
OpenSwan sometimes hanging, but that's not happening with scp here.

This is a "tcpdump port 7000" on the client machine:

21:48:07.671387 IP confucius.ho.in.athenacr.com.afs3-callback >
helmsley.dev.in.athenacr.com.afs3-fileserver:  rx ack first 38 serial
0 reason delay (65)
21:48:07.739039 IP helmsley.dev.in.athenacr.com.afs3-fileserver >
confucius.ho.in.athenacr.com.afs3-callback:  rx data (1436)
21:48:07.739051 IP confucius.ho.in.athenacr.com.afs3-callback >
helmsley.dev.in.athenacr.com.afs3-fileserver:  rx ack first 38 serial
65 reason ack requested acked 637534208 (66)
21:48:07.742036 IP helmsley.dev.in.athenacr.com.afs3-fileserver >
confucius.ho.in.athenacr.com.afs3-callback:  rx data (1436)
21:48:07.742050 IP confucius.ho.in.athenacr.com.afs3-callback >
helmsley.dev.in.athenacr.com.afs3-fileserver:  rx ack first 39 serial
66 reason ack requested acked 654311424 (66)
21:48:08.119058 IP helmsley.dev.in.athenacr.com.afs3-fileserver >
confucius.ho.in.athenacr.com.afs3-callback:  rx data (1436)
21:48:08.119075 IP confucius.ho.in.athenacr.com.afs3-callback >
helmsley.dev.in.athenacr.com.afs3-fileserver:  rx ack first 40 serial
69 reason ack requested acked 671088640 (66)
21:48:08.122055 IP helmsley.dev.in.athenacr.com.afs3-fileserver >
confucius.ho.in.athenacr.com.afs3-callback:  rx data (1436)

What's going on with the "rx ack first NN serial NN reason ack
requested" ... ?  I don't see those on a machine that's working
normally.

On a different client through a different, but similarly configured
OpenSwan connection, the hang is even worse:

21:56:07.912736 IP berra.ho.in.athenacr.com.afs3-callback >
hippo.dev.in.athenacr.com.afs3-fileserver:  rx ack first 1 serial 0
reason ping (65)
21:56:07.915493 IP hippo.dev.in.athenacr.com.afs3-fileserver >
berra.ho.in.athenacr.com.afs3-callback:  rx ack first 2 serial 8
reason ping response (65)
21:56:16.072313 IP berra.ho.in.athenacr.com.afs3-callback >
hippo.dev.in.athenacr.com.afs3-fileserver:  rx ack first 1 serial 0
reason ping (65)
21:56:16.074425 IP hippo.dev.in.athenacr.com.afs3-fileserver >
berra.ho.in.athenacr.com.afs3-callback:  rx ack first 2 serial 9
reason ping response (65)
21:56:21.608237 IP hippo.dev.in.athenacr.com.afs3-fileserver >
berra.ho.in.athenacr.com.afs3-callback:  rx ack first 2 serial 0
reason ping (65)
21:56:21.608286 IP berra.ho.in.athenacr.com.afs3-callback >
hippo.dev.in.athenacr.com.afs3-fileserver:  rx ack first 1 serial 13
reason ping response (65)
21:56:31.637632 IP hippo.dev.in.athenacr.com.afs3-fileserver >
berra.ho.in.athenacr.com.afs3-callback:  rx ack first 2 serial 0
reason ping (65)
21:56:31.637651 IP berra.ho.in.athenacr.com.afs3-callback >
hippo.dev.in.athenacr.com.afs3-fileserver:  rx ack first 1 serial 15
reason ping response (65)



I've found references to possible MTU problems with OpenAFS and
OpenSwan, but nothing really useful.  I've messed with various MTU
sizes on the clients and servers, but nothing seems to make a
difference.  I also found a reference to possibly trying to run the
lwp fileserver rather than pthreads, but I haven't gotten around to
testing that yet.  I'd like to know if there are other paths I should
pursue before that one.

Thanks,
Wes

-- 
http://www.woahnelly.net/~wes/          OpenPGP key = 0xA5CA6644
fingerprint = FDE5 21D8 9D8B 386F 128F  DF52 3F52 D582 A5CA 6644