[OpenAFS] afs memcache tuning... lockups in afs_cv_wait

Mike Polek mike@pictage.com
Wed, 13 Feb 2008 18:17:38 -0800


Mike Garrison wrote:
> 
> What parameters are you using for the client? What's rxdebug <client> - 
> port 7001 -rxstats show?
> 
> -- 
> Mike Garrison

# cmdebug localhost -cache
Chunk files:   40960
Stat caches:   50000
Data caches:   40960
Volume caches: 512
Chunk size:    16384
Cache size:    655360 kB
Set time:      no
Cache type:    memory

Basically

afsd -memcache -blocks 655360 -chunksize 14 -stat 50000 -daemons 6
      -volumes 512 -nosettime


vmlinuz vmalloc=848M       (A little tight, I know...)

MemTotal:      2071764 kB
MemFree:         77012 kB
Buffers:             0 kB
Cached:        1258180 kB
Active:         701412 kB
Inactive:       599428 kB
HighTotal:     1916864 kB
HighFree:         7296 kB
LowTotal:       154900 kB
LowFree:         69716 kB
VmallocTotal:   851960 kB
VmallocUsed:    833472 kB
VmallocChunk:    16740 kB


RX stats when throttled and humming along nicely:

# rxdebug localhost -port 7001 -rxstats | head -15
Trying 192.168.11.19 (port 7001):
Free packets: 155, packet reclaims: 0, calls: 228, used FDs: 64
not waiting for packets.
0 calls waiting for a thread
1 threads are idle
rx stats: free packets 155, allocs 1954462, alloc-failures(rcv 0/0,send 
266493/0,ack 0)
    greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0, 
selects 0, sendSelects 0
    packets read: data 16843 ack 580519 busy 0 abort 3 ackall 0 challenge 
191 response 0 debug 217 params 0 unused 0 unused 0 unused 0 version 0
    other read counters: data 16843, ack 575197, dup 0 spurious 5317 dally 5
    packets sent: data 486114 ack 20105 busy 0 abort 0 ackall 0 challenge 0 
response 191 debug 0 params 0 unused 0 unused 0 unused 0 version 0
    other send counters: ack 20105, data 3835588 (not resends), resends 292, 
pushed 0, acked&ignored 1970250
         (these should be small) sendFailed 0, fatalErrors 0
    Average rtt is 0.001, with 191368 samples
    Minimum rtt is 0.000, maximum is 39.537
    22 server connections, 198 client connections, 22 peer structs, 239 call 
structs, 145 free call structs


RX stats under heavy load:

Trying 192.168.11.19 (port 7001):
Free packets: 9, packet reclaims: 2, calls: 434, used FDs: 64
not waiting for packets.
0 calls waiting for a thread
1 threads are idle
rx stats: free packets 9, allocs 3892227, alloc-failures(rcv 0/0,send 
281445/0,ack 0)
    greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 6, 
selects 0, sendSelects 0
    packets read: data 22528 ack 1169510 busy 0 abort 3 ackall 0 challenge 
374 response 0 debug 414 params 0 unused 0 unused 0 unused 0 version 0
    other read counters: data 22528, ack 1158659, dup 0 spurious 10842 dally 9
    packets sent: data 970891 ack 26381 busy 0 abort 2 ackall 0 challenge 0 
response 374 debug 0 params 0 unused 0 unused 0 unused 0 version 0
    other send counters: ack 26381, data 7687708 (not resends), resends 788, 
pushed 0, acked&ignored 4089184
         (these should be small) sendFailed 0, fatalErrors 0
    Average rtt is 0.001, with 393261 samples
    Minimum rtt is 0.000, maximum is 39.537
    17 server connections, 232 client connections, 32 peer structs, 239 call 
structs, 32 free call structs

RX stats after a lockup occurs:

Trying 192.168.11.19 (port 7001):
Free packets: 290, packet reclaims: 3, calls: 434, used FDs: 64
not waiting for packets.
0 calls waiting for a thread
1 threads are idle
rx stats: free packets 290, allocs 3919137, alloc-failures(rcv 0/0,send 
302633/0,ack 0)
    greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 11, 
selects 0, sendSelects 0
    packets read: data 22611 ack 1177559 busy 0 abort 199 ackall 0 challenge 
374 response 0 debug 509 params 0 unused 0 unused 0 unused 0 version 0
    other read counters: data 22611, ack 1166647, dup 0 spurious 10903 dally 9
    packets sent: data 977805 ack 26791 busy 0 abort 2 ackall 0 challenge 0 
response 374 debug 0 params 0 unused 0 unused 0 unused 0 version 0
    other send counters: ack 26791, data 7740578 (not resends), resends 788, 
pushed 0, acked&ignored 4106054
         (these should be small) sendFailed 0, fatalErrors 0
    Average rtt is 0.001, with 395696 samples
    Minimum rtt is 0.000, maximum is 39.537
    15 server connections, 232 client connections, 32 peer structs, 239 call 
structs, 138 free call structs


And the counters pretty much grind to a halt.... nothing much moves...
A little while later it shows:

Trying 192.168.11.19 (port 7001):
Free packets: 290, packet reclaims: 3, calls: 442, used FDs: 64
not waiting for packets.
0 calls waiting for a thread
1 threads are idle
rx stats: free packets 290, allocs 3919169, alloc-failures(rcv 0/0,send 
302633/0,ack 0)
    greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 11, 
selects 0, sendSelects 0
    packets read: data 22623 ack 1177571 busy 0 abort 199 ackall 0 challenge 
374 response 0 debug 572 params 0 unused 0 unused 0 unused 0 version 0
    other read counters: data 22623, ack 1166659, dup 0 spurious 10903 dally 9
    packets sent: data 977817 ack 26799 busy 0 abort 2 ackall 0 challenge 0 
response 374 debug 0 params 0 unused 0 unused 0 unused 0 version 0
    other send counters: ack 26799, data 7740602 (not resends), resends 788, 
pushed 0, acked&ignored 4106058
         (these should be small) sendFailed 0, fatalErrors 0
    Average rtt is 0.001, with 395696 samples
    Minimum rtt is 0.000, maximum is 39.537
    14 server connections, 183 client connections, 32 peer structs, 239 call 
structs, 237 free call structs

The free call structs goes up... server connections down a bit...
but no major shifts that I can see. Machine was rebooted right before
the tests. I didn't reboot between the throttled test and the overload test.


Anything look interesting to you?

Thanks!!
Mike






-- 
Michael Polek
Director of System Operations
1580 Francisco Street, Suite 101
Torrance, CA 90501
Phone: (310) 525-1600 ext. 628
Email: mike@pictage.com
http://www.pictage.com