[OpenAFS] afs memcache tuning... lockups in afs_cv_wait

Mike Garrison mcgarr@umich.edu
Thu, 14 Feb 2008 15:17:35 -0500


On Feb 13, 2008, at 9:17 PM, Mike Polek wrote:

> Mike Garrison wrote:
>> What parameters are you using for the client? What's rxdebug  
>> <client> - port 7001 -rxstats show?
>> -- 
>> Mike Garrison
>
> # cmdebug localhost -cache
> Chunk files:   40960
> Stat caches:   50000
> Data caches:   40960
> Volume caches: 512
> Chunk size:    16384
> Cache size:    655360 kB
> Set time:      no
> Cache type:    memory
>
> Basically
>
> afsd -memcache -blocks 655360 -chunksize 14 -stat 50000 -daemons 6
>     -volumes 512 -nosettime

I'd suggest tweaking -rxpck to be higher than the default, it may  
actually help with the issue you're running into. We use 2000 for it.  
My mind slips me as to why we picked that number at this point.

> vmlinuz vmalloc=848M       (A little tight, I know...)
> [snip]
>
> RX stats when throttled and humming along nicely:
>
> # rxdebug localhost -port 7001 -rxstats | head -15
> Trying 192.168.11.19 (port 7001):
> Free packets: 155, packet reclaims: 0, calls: 228, used FDs: 64
> not waiting for packets.
> 0 calls waiting for a thread
> 1 threads are idle
> rx stats: free packets 155, allocs 1954462, alloc-failures(rcv  
> 0/0,send 266493/0,ack 0)

266493 alloc-failures for sends? That already seems like something  
isn't right.
>
>   greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers  
> 0, selects 0, sendSelects 0
>   packets read: data 16843 ack 580519 busy 0 abort 3 ackall 0  
> challenge 191 response 0 debug 217 params 0 unused 0 unused 0 unused  
> 0 version 0
>   other read counters: data 16843, ack 575197, dup 0 spurious 5317  
> dally 5
>   packets sent: data 486114 ack 20105 busy 0 abort 0 ackall 0  
> challenge 0 response 191 debug 0 params 0 unused 0 unused 0 unused 0  
> version 0
>   other send counters: ack 20105, data 3835588 (not resends),  
> resends 292, pushed 0, acked&ignored 1970250
>        (these should be small) sendFailed 0, fatalErrors 0
>   Average rtt is 0.001, with 191368 samples
>   Minimum rtt is 0.000, maximum is 39.537
>   22 server connections, 198 client connections, 22 peer structs,  
> 239 call structs, 145 free call structs
>
>
> RX stats under heavy load:
>
> Trying 192.168.11.19 (port 7001):
> Free packets: 9, packet reclaims: 2, calls: 434, used FDs: 64

Only 9 free packets. Ew!

>
> not waiting for packets.
> 0 calls waiting for a thread
> 1 threads are idle
> rx stats: free packets 9, allocs 3892227, alloc-failures(rcv  
> 0/0,send 281445/0,ack 0)
>   greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers  
> 6, selects 0, sendSelects 0

noBuffers.. you've had to discard some rx packets because you're out  
of rx packets.

>   packets read: data 22528 ack 1169510 busy 0 abort 3 ackall 0  
> challenge 374 response 0 debug 414 params 0 unused 0 unused 0 unused  
> 0 version 0
>   other read counters: data 22528, ack 1158659, dup 0 spurious 10842  
> dally 9
>   packets sent: data 970891 ack 26381 busy 0 abort 2 ackall 0  
> challenge 0 response 374 debug 0 params 0 unused 0 unused 0 unused 0  
> version 0
>   other send counters: ack 26381, data 7687708 (not resends),  
> resends 788, pushed 0, acked&ignored 4089184
>        (these should be small) sendFailed 0, fatalErrors 0
>   Average rtt is 0.001, with 393261 samples
>   Minimum rtt is 0.000, maximum is 39.537
>   17 server connections, 232 client connections, 32 peer structs,  
> 239 call structs, 32 free call structs
>
> RX stats after a lockup occurs:
>
> Trying 192.168.11.19 (port 7001):
> Free packets: 290, packet reclaims: 3, calls: 434, used FDs: 64
> not waiting for packets.
> 0 calls waiting for a thread
> 1 threads are idle
> rx stats: free packets 290, allocs 3919137, alloc-failures(rcv  
> 0/0,send 302633/0,ack 0)
>   greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers  
> 11, selects 0, sendSelects 0

Even more noBuffers .. and again, the alloc-failures for send are huge.

>   packets read: data 22611 ack 1177559 busy 0 abort 199 ackall 0  
> challenge 374 response 0 debug 509 params 0 unused 0 unused 0 unused  
> 0 version 0
>   other read counters: data 22611, ack 1166647, dup 0 spurious 10903  
> dally 9
>   packets sent: data 977805 ack 26791 busy 0 abort 2 ackall 0  
> challenge 0 response 374 debug 0 params 0 unused 0 unused 0 unused 0  
> version 0
>   other send counters: ack 26791, data 7740578 (not resends),  
> resends 788, pushed 0, acked&ignored 4106054
>        (these should be small) sendFailed 0, fatalErrors 0
>   Average rtt is 0.001, with 395696 samples
>   Minimum rtt is 0.000, maximum is 39.537
>   15 server connections, 232 client connections, 32 peer structs,  
> 239 call structs, 138 free call structs
>

The only things that really stick out to me is the low number of rx  
packets, I'd try increasing rxpck and seeing if that helps.  
Unfortunately, I don't have much experience with the memcache, but I  
have a strong feeling that you shouldn't be seeing such a high number  
of alloc failures for sending..

--
Mike Garrison