[OpenAFS] Heavy performance loss on gigabit ethernet

Systems Administration sysadmin@contrailservices.com
Mon, 16 Aug 2004 15:02:11 -0600


> The fileserver tcpdump would be more useful at a higher verbosity 
> (tcpdump -s (whatever the mtu is) -vv, particularly, will tell you 
> what RPCs are happening)
>

Ok I put up a dump from the fileserver:  "tcpdump -s 1500 -vv port 7000"

Full log at:
http://cumulonimbus.contrailservices.com/ted/tcpdump_081604.txt

My unpracticed eye sees clear retransmissions from the FS to the client 
- and I think this may explain the original post on this thread - 
somehow the UDP protocol is missing frames - possibly a poor gigabit to 
100mb translation.  My evidence for this speculation is that during 
this test I saw clear 'pauses' where the command ('/bin/rm -rvf 
<dirname>') would halt reading from the AFS server and tcpdump showed 
multiple retransmissions before the process would continue, this 
happened at least twice during the test before the final 'indefinitely 
long pause' - my hangup - occurred.

  If this is happening silently during the large file copy that Enrico 
posted about then he would see apparently lowered performance due to 
these pauses during the transaction.

I hope this is the right log - I can repeat the test with any other set 
of TCP dump logs you need.

---------


14:33:03.204153 IP (tos 0x0, ttl  64, id 2593, offset 0, flags [none], 
length: 84) lightning.xxx.afs3-callback > turbine.xxx.afs3-fileserver: 
[udp sum ok]  rx data cid f224e070 call# 636 seq 1 ser 1297 
<client-init>,<last-pckt> fs call rmdir fid 536871033/801/2505 "dlcl16" 
(56)
14:33:03.206172 IP (tos 0x0, ttl  64, id 47429, offset 0, flags [DF], 
length: 164) turbine.xxx.afs3-fileserver > lightning.xxx.afs3-callback: 
[udp sum ok]  rx data cid f224e070 call# 636 seq 1 ser 672 <last-pckt> 
fs reply rmdir (136)
14:33:03.206614 IP (tos 0x0, ttl  64, id 2594, offset 0, flags [none], 
length: 93) lightning.xxx.afs3-callback > turbine.xxx.afs3-fileserver: 
[udp sum ok]  rx ack cid f224e070 call# 636 seq 0 ser 1298 
<client-init>,<slow-start> first 2 serial 0 reason delay ifmtu 5692 
maxmtu 1444 rwind 32 maxpackets 4 (65)
14:33:03.206967 IP (tos 0x0, ttl  64, id 2595, offset 0, flags [none], 
length: 80) lightning.xxx.afs3-callback > turbine.xxx.afs3-fileserver: 
[udp sum ok]  rx data cid f224e070 call# 637 seq 1 ser 1299 
<client-init>,<last-pckt> fs call fetch-data fid 536871033/821/2810 
offset 0 length 999999999 (52)
14:33:03.255000 IP (tos 0x0, ttl  64, id 43322, offset 0, flags [DF], 
length: 1468) turbine.xxx.afs3-fileserver > 
lightning.xxx.afs3-callback: [udp sum ok]  rx data cid f224e070 call# 
637 seq 1 ser 673 <req-ack> fs reply fetch-data (1440)
14:33:03.635951 IP (tos 0x0, ttl  64, id 54305, offset 0, flags [DF], 
length: 1468) turbine.xxx.afs3-fileserver > 
lightning.xxx.afs3-callback: [udp sum ok]  rx data cid f224e070 call# 
637 seq 1 ser 674 <req-ack> fs reply fetch-data (1440)
14:33:04.265943 IP (tos 0x0, ttl  64, id 48379, offset 0, flags [DF], 
length: 1468) turbine.xxx.afs3-fileserver > 
lightning.xxx.afs3-callback: [udp sum ok]  rx data cid f224e070 call# 
637 seq 1 ser 675 <req-ack> fs reply fetch-data (1440)
14:33:05.310841 IP (tos 0x0, ttl  64, id 2600, offset 0, flags [none], 
length: 80) lightning.xxx.afs3-callback > turbine.xxx.afs3-fileserver: 
[udp sum ok]  rx data cid f224e070 call# 637 seq 1 ser 1300 
<client-init>,<req-ack>,<last-pckt> fs call fetch-data fid 
536871033/821/2810 offset 0 length 999999999 (52)
14:33:05.310900 IP (tos 0x0, ttl  64, id 60175, offset 0, flags [DF], 
length: 93) turbine.xxx.afs3-fileserver > lightning.xxx.afs3-callback: 
[udp sum ok]  rx ack cid f224e070 call# 637 seq 1 ser 676 <slow-start> 
first 2 serial 1300 reason duplicate packet ifmtu 5692 maxmtu 1444 
rwind 32 maxpackets 4 (65)
14:33:05.416014 IP (tos 0x0, ttl  64, id 35162, offset 0, flags [DF], 
length: 1468) turbine.xxx.afs3-fileserver > 
lightning.xxx.afs3-callback: [udp sum ok]  rx data cid f224e070 call# 
637 seq 1 ser 677 <req-ack> fs reply fetch-data (1440)
14:33:07.586115 IP (tos 0x0, ttl  64, id 63302, offset 0, flags [DF], 
length: 1468) turbine.xxx.afs3-fileserver > 
lightning.xxx.afs3-callback: [udp sum ok]  rx data cid f224e070 call# 
637 seq 1 ser 678 <req-ack> fs reply fetch-data (1440)
14:33:11.806198 IP (tos 0x0, ttl  64, id 58239, offset 0, flags [DF], 
length: 1468) turbine.xxx.afs3-fileserver > 
lightning.xxx.afs3-callback: [udp sum ok]  rx data cid f224e070 call# 
637 seq 1 ser 679 <req-ack> fs reply fetch-data (1440)
14:33:18.206424 IP (tos 0x0, ttl  64, id 51924, offset 0, flags [DF], 
length: 93) turbine.xxx.afs3-fileserver > lightning.xxx.afs3-callback: 
[udp sum ok]  rx ack cid f224e070 call# 637 seq 0 ser 680 
<req-ack>,<slow-start> first 2 serial 0 reason ping ifmtu 5692 maxmtu 
1444 rwind 32 maxpackets 4 (65)
14:33:18.206935 IP (tos 0x0, ttl  64, id 2605, offset 0, flags [none], 
length: 93) lightning.xxx.afs3-callback > turbine.xxx.afs3-fileserver: 
[udp sum ok]  rx ack cid f224e070 call# 637 seq 0 ser 1301 
<client-init>,<slow-start> first 1 serial 680 reason ping response 
ifmtu 5692 maxmtu 1444 rwind 16 maxpackets 4 (65)
14:33:20.116481 IP (tos 0x0, ttl  64, id 48860, offset 0, flags [DF], 
length: 1468) turbine.xxx.afs3-fileserver > 
lightning.xxx.afs3-callback: [udp sum ok]  rx data cid f224e070 call# 
637 seq 1 ser 681 <req-ack> fs reply fetch-data (1440)
14:33:27.312779 IP (tos 0x0, ttl  64, id 2608, offset 0, flags [none], 
length: 93) lightning.xxx.afs3-callback > turbine.xxx.afs3-fileserver: 
[udp sum ok]  rx ack cid f224e070 call# 637 seq 0 ser 1302 
<client-init>,<req-ack>,<slow-start> first 1 serial 0 reason ping ifmtu 
5692 maxmtu 1444 rwind 16 maxpackets 4 (65)