[OpenAFS] Fileserver in semi-meltdown state

Renata Maria Dart Renata Maria Dart <renata@slac.stanford.edu>
Tue, 20 Jul 2004 09:27:44 -0700 (PDT)


Hi, I am currently experiencing slow or non-existent response time
from one of our fileservers, running OpenAFS 1.2.11 on solaris 9.
Vos commands hang and an ls of directories on that server also hangs.
The fileserver is set up to run as:


    Command 1 is '/usr/afs/bin/fileserver -L -p 100 -cb 64000 -rxpck 2000 
-udpsize 1048576 -busyat 300'
    Command 2 is '/usr/afs/bin/volserver -udpsize 1048576'


A snapshot of meltdown ouput currently looks like:

09:08:24 0     0        0      5483     65194522  369    347166814 44716    90  
09:08:44 0     0        0      5525     65194649  127    347209650 44718    91  
09:09:04 0     0        0      5490     65195983  1334   347254434 44720    90  
09:09:24 0     0        0      5495     65196254  271    347308182 44723    90  
09:09:44 0     0        0      5488     65196384  130    347336262 44723    90  
09:10:04 0     0        0      5525     65197626  1242   347383702 44723    91  
09:10:24 0     0        0      5525     65198056  430    347384696 44723    91  
09:10:44 0     0        0      5525     65198167  111    347388802 44724    91  
09:11:04 0     0        0      5525     65199328  1161   347391306 44724    91  
09:11:25 0     0        0      5525     65199394  66     347391472 44724    91  
09:11:45 0     0        0      5525     65199417  23     347391532 44724    91  
09:12:05 0     0        0      5525     65200448  1031   347393646 44724    91  
09:12:25 0     0        0      5525     65200497  49     347393778 44724    91  

Whereas it normally looks more like:

09:26:10 0     0        0      5526     218364049  0      1524633688 95534    
102 
09:26:30 0     0        0      5526     218364338  289    1524678144 95534    
102 
09:26:50 0     0        0      5526     218364458  120    1524680972 95534    
102 


I see the updInOverFlow numbers increasing over time:

enata@afs08 $ 9:04 netstat -s | grep udpInOverflow
        udpInCksumErrs      =     0     udpInOverflows      =  2920
        udpInCksumErrs      =     0     udpInOverflows      =     0

root@afs08 # 9:06 netstat -s | grep udpInOverflow
        udpInCksumErrs      =     0     udpInOverflows      =  2946
        udpInCksumErrs      =     0     udpInOverflows      =     0

root@afs08 # 9:16 netstat -s | grep udpInOverflow
        udpInCksumErrs      =     0     udpInOverflows      =  3102
        udpInCksumErrs      =     0     udpInOverflows      =     0

The udp_max_buf is set to 1048576.

The top command shows the fileserver to be very lightly loaded:

  427 root     109  59   -5   53M   52M sleep 745:03  0.44% fileserver

Rxdebug shows:

renata@victoria $ 16:20 rxdebug -server afs08 -port 7000 -rxstats
Trying 134.79.17.78 (port 7000):
Free packets: 5525, packet reclaims: 2548, calls: 65184479, used FDs: 64
not waiting for packets.
0 calls waiting for a thread
91 threads are idle
rx stats: free packets 5525, allocs 273221822, alloc-failures(rcv 0/0,send 
0/0,ack 0)
   greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0, selects 
0, sendSelects 0
   packets read: data 81937474 ack 120271342 busy 0 abort 137706 ackall 2372 
challenge 4 response 515695 debug 10906 params 0 unused 0 unused 0 unused 0 
version 0 
   other read counters: data 81937464, ack 120271034, dup 1014 spurious 278 
dally 30
   packets sent: data 139147853 ack 16840498 busy 0 abort 294783 ackall 0 
challenge 515697 response 4 debug 0 params 0 unused 0 unused 0 unused 0 version 
0 
   other send counters: ack 16840498, data 347138128 (not resends), resends 
44709, pushed 0, acked&ignored 134995217
    (these should be small) sendFailed 0, fatalErrors 1964
   Average rtt is 0.009, with 42988716 samples
   Minimum rtt is 0.000, maximum is 60.996
   744 server connections, 1005 client connections, 44002 peer structs, 7328 
call structs, 7184 free call structs


Are there any suggestions about what the problem is and what I can do
to fix it?  Is there any other information I should try gathering?

-Renata

 Renata Dart                         | renata@SLAC.Stanford.edu  
 Stanford Linear Accelerator Center  |    
 2575 Sand Hill Road, MS 97          | (650) 926-2848 (office)
 Stanford, California   94025        | (650) 926-3329 (fax)