[OpenAFS] Connection timed out?

Robbert Eggermont R.Eggermont@TUDelft.nl
Tue, 10 Mar 2009 12:53:46 +0100


Felix Frank wrote:
> The number of threads seems to be more than appropriate for 50 clients.
> It might be interesting to look at the output of "rxdebug <server> 7000"
> during a build, especially the top, where it tells you about waiting calls
> and idle threads.

The test consists of an untar, make -j2, and rm. The connection timeouts
started at about 22:05 (during the make).

rxdebug@server:
> 2009-03-09T21:15+0100: Trying 127.0.0.1 (port 7000):
> Free packets: 2891, packet reclaims: 10968, calls: 14533306, used FDs: 20
> not waiting for packets.
> 0 calls waiting for a thread
> 123 threads are idle
> 2009-03-09T21:20+0100: Trying 127.0.0.1 (port 7000):
> Free packets: 2496, packet reclaims: 10968, calls: 14806865, used FDs: 61
> not waiting for packets.
> 0 calls waiting for a thread
> 78 threads are idle
> 2009-03-09T21:25+0100: Trying 127.0.0.1 (port 7000):
> Free packets: 2067, packet reclaims: 10968, calls: 15155769, used FDs: 64
> not waiting for packets.
> 0 calls waiting for a thread
> 86 threads are idle
> 2009-03-09T21:30+0100: Trying 127.0.0.1 (port 7000):
> Free packets: 2361, packet reclaims: 10968, calls: 15451575, used FDs: 64
> not waiting for packets.
> 0 calls waiting for a thread
> 87 threads are idle
> 2009-03-09T21:35+0100: Trying 127.0.0.1 (port 7000):
> Free packets: 2361, packet reclaims: 10968, calls: 15888390, used FDs: 64
> not waiting for packets.
> 0 calls waiting for a thread
> 99 threads are idle
> 2009-03-09T21:40+0100: Trying 127.0.0.1 (port 7000):
> Free packets: 2382, packet reclaims: 10968, calls: 16312797, used FDs: 64
> not waiting for packets.
> 0 calls waiting for a thread
> 96 threads are idle
> 2009-03-09T21:45+0100: Trying 127.0.0.1 (port 7000):
> Free packets: 2551, packet reclaims: 10968, calls: 17050004, used FDs: 64
> not waiting for packets.
> 0 calls waiting for a thread
> 105 threads are idle
> 2009-03-09T21:50+0100: Trying 127.0.0.1 (port 7000):
> Free packets: 2697, packet reclaims: 10968, calls: 17827397, used FDs: 64
> not waiting for packets.
> 0 calls waiting for a thread
> 99 threads are idle
> 2009-03-09T21:55+0100: Trying 127.0.0.1 (port 7000):
> Free packets: 2574, packet reclaims: 10968, calls: 18517191, used FDs: 64
> not waiting for packets.
> 0 calls waiting for a thread
> 103 threads are idle
> 2009-03-09T22:00+0100: Trying 127.0.0.1 (port 7000):
> Free packets: 2562, packet reclaims: 10968, calls: 19140482, used FDs: 64
> not waiting for packets.
> 0 calls waiting for a thread
> 90 threads are idle
> 2009-03-09T22:05+0100: Trying 127.0.0.1 (port 7000):
> Free packets: 1466, packet reclaims: 11269, calls: 19335878, used FDs: 64
> not waiting for packets.
> 0 calls waiting for a thread
> 40 threads are idle
> 2009-03-09T22:10+0100: Trying 127.0.0.1 (port 7000):
> Free packets: 1219, packet reclaims: 12979, calls: 19414589, used FDs: 64
> not waiting for packets.
> 0 calls waiting for a thread
> 43 threads are idle
> 2009-03-09T22:15+0100: Trying 127.0.0.1 (port 7000):
> Free packets: 2484, packet reclaims: 14897, calls: 19466551, used FDs: 64
> not waiting for packets.
> 0 calls waiting for a thread
> 84 threads are idle

uptime@server:
>  21:20:02 up 27 days,  4:34,  9 users,  load average: 6.14, 2.46, 0.95
>  21:25:01 up 27 days,  4:39,  9 users,  load average: 3.72, 3.92, 2.05
>  21:30:01 up 27 days,  4:44,  9 users,  load average: 5.04, 3.94, 2.50
>  21:35:02 up 27 days,  4:49,  9 users,  load average: 5.72, 4.82, 3.26
>  21:40:01 up 27 days,  4:54,  9 users,  load average: 7.06, 5.53, 3.95
>  21:45:01 up 27 days,  4:59,  9 users,  load average: 10.97, 8.74, 5.73
>  21:50:02 up 27 days,  5:04, 10 users,  load average: 4.00, 7.05, 5.94
>  21:55:02 up 27 days,  5:09, 10 users,  load average: 4.29, 5.32, 5.46
>  22:00:02 up 27 days,  5:14, 10 users,  load average: 8.73, 8.09, 6.68
>  22:05:02 up 27 days,  5:19, 10 users,  load average: 2.99, 5.27, 5.89
>  22:10:02 up 27 days,  5:24, 10 users,  load average: 2.38, 3.75, 5.07
>  22:15:02 up 27 days,  5:29, 10 users,  load average: 4.29, 3.44, 4.51

The first peak is during the untar, the second during the make.
After ~10 clients timed out, the load went down a bit.

rxdebug localhost -rxstats -long (from this morning):
> Trying 127.0.0.1 (port 7000):
> Free packets: 2895, packet reclaims: 18020, calls: 22235421, used FDs: 13
> not waiting for packets.
> 0 calls waiting for a thread
> 123 threads are idle
> rx stats: free packets 2895, allocs 367120898, alloc-failures(rcv 0/0,send 0/0,ack 0)
>    greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0, selects 0, sendSelects 0
>    packets read: data 327835144 ack 33311295 busy 0 abort 3 ackall 0 challenge 1066 response 610 debug 654 params 0 unused 0 unused 0 unused 0 version 0
>    other read counters: data 327835144, ack 33311295, dup 3574 spurious 0 dally 0
>    packets sent: data 38234254 ack 206138290 busy 0 abort 3072 ackall 0 challenge 626 response 1066 debug 0 params 0 unused 0 unused 0 unused 0 version 0
>    other send counters: ack 206138290, data 76468508 (not resends), resends 18183, pushed 0, acked&ignored 15243073
>         (these should be small) sendFailed 0, fatalErrors 0
>    Average rtt is 0.007, with 9180312 samples
>    Minimum rtt is 0.000, maximum is 52.506
>    1 server connections, 128 client connections, 4 peer structs, 277 call structs, 277 free call structs
> Done.

If I read this correctly, there were some resends but no failures or
fatal errors.

ifconfig eth0:
> eth0      Link encap:Ethernet  HWaddr 00:15:60:AD:B0:36
>           inet addr:131.180.6.37  Bcast:131.180.7.255  Mask:255.255.252.0
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:6239334116 errors:0 dropped:10655 overruns:0 frame:0
>           TX packets:4803734593 errors:1 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:5903764857844 (5.3 TiB)  TX bytes:2169088382907 (1.9 TiB)
>           Interrupt:209

Some dropped packets, but no physical network problems on the server side.

Regards,

Robbert

-- 
Robbert Eggermont                   Information & Communication Theory
R.Eggermont@TUDelft.nl         Electr.Eng., Mathematics & Comp.Science
+31 (15) 2783234                        Delft University of Technology