[OpenAFS] Fileserver in semi-meltdown state
Renata Maria Dart
Renata Maria Dart <renata@slac.stanford.edu>
Tue, 20 Jul 2004 09:27:44 -0700 (PDT)
Hi, I am currently experiencing slow or non-existent response time
from one of our fileservers, running OpenAFS 1.2.11 on solaris 9.
Vos commands hang and an ls of directories on that server also hangs.
The fileserver is set up to run as:
Command 1 is '/usr/afs/bin/fileserver -L -p 100 -cb 64000 -rxpck 2000
-udpsize 1048576 -busyat 300'
Command 2 is '/usr/afs/bin/volserver -udpsize 1048576'
A snapshot of meltdown ouput currently looks like:
09:08:24 0 0 0 5483 65194522 369 347166814 44716 90
09:08:44 0 0 0 5525 65194649 127 347209650 44718 91
09:09:04 0 0 0 5490 65195983 1334 347254434 44720 90
09:09:24 0 0 0 5495 65196254 271 347308182 44723 90
09:09:44 0 0 0 5488 65196384 130 347336262 44723 90
09:10:04 0 0 0 5525 65197626 1242 347383702 44723 91
09:10:24 0 0 0 5525 65198056 430 347384696 44723 91
09:10:44 0 0 0 5525 65198167 111 347388802 44724 91
09:11:04 0 0 0 5525 65199328 1161 347391306 44724 91
09:11:25 0 0 0 5525 65199394 66 347391472 44724 91
09:11:45 0 0 0 5525 65199417 23 347391532 44724 91
09:12:05 0 0 0 5525 65200448 1031 347393646 44724 91
09:12:25 0 0 0 5525 65200497 49 347393778 44724 91
Whereas it normally looks more like:
09:26:10 0 0 0 5526 218364049 0 1524633688 95534
102
09:26:30 0 0 0 5526 218364338 289 1524678144 95534
102
09:26:50 0 0 0 5526 218364458 120 1524680972 95534
102
I see the updInOverFlow numbers increasing over time:
enata@afs08 $ 9:04 netstat -s | grep udpInOverflow
udpInCksumErrs = 0 udpInOverflows = 2920
udpInCksumErrs = 0 udpInOverflows = 0
root@afs08 # 9:06 netstat -s | grep udpInOverflow
udpInCksumErrs = 0 udpInOverflows = 2946
udpInCksumErrs = 0 udpInOverflows = 0
root@afs08 # 9:16 netstat -s | grep udpInOverflow
udpInCksumErrs = 0 udpInOverflows = 3102
udpInCksumErrs = 0 udpInOverflows = 0
The udp_max_buf is set to 1048576.
The top command shows the fileserver to be very lightly loaded:
427 root 109 59 -5 53M 52M sleep 745:03 0.44% fileserver
Rxdebug shows:
renata@victoria $ 16:20 rxdebug -server afs08 -port 7000 -rxstats
Trying 134.79.17.78 (port 7000):
Free packets: 5525, packet reclaims: 2548, calls: 65184479, used FDs: 64
not waiting for packets.
0 calls waiting for a thread
91 threads are idle
rx stats: free packets 5525, allocs 273221822, alloc-failures(rcv 0/0,send
0/0,ack 0)
greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0, selects
0, sendSelects 0
packets read: data 81937474 ack 120271342 busy 0 abort 137706 ackall 2372
challenge 4 response 515695 debug 10906 params 0 unused 0 unused 0 unused 0
version 0
other read counters: data 81937464, ack 120271034, dup 1014 spurious 278
dally 30
packets sent: data 139147853 ack 16840498 busy 0 abort 294783 ackall 0
challenge 515697 response 4 debug 0 params 0 unused 0 unused 0 unused 0 version
0
other send counters: ack 16840498, data 347138128 (not resends), resends
44709, pushed 0, acked&ignored 134995217
(these should be small) sendFailed 0, fatalErrors 1964
Average rtt is 0.009, with 42988716 samples
Minimum rtt is 0.000, maximum is 60.996
744 server connections, 1005 client connections, 44002 peer structs, 7328
call structs, 7184 free call structs
Are there any suggestions about what the problem is and what I can do
to fix it? Is there any other information I should try gathering?
-Renata
Renata Dart | renata@SLAC.Stanford.edu
Stanford Linear Accelerator Center |
2575 Sand Hill Road, MS 97 | (650) 926-2848 (office)
Stanford, California 94025 | (650) 926-3329 (fax)