[OpenAFS] afs memcache tuning... lockups in afs_cv_wait

Mike Polek mike@pictage.com
Wed, 13 Feb 2008 16:50:18 -0800

1) Cool... I'll build a 1.4.5 client and reproduce the issue, and hopefully
    get more info.
2) 100 or so threads. Not a serious problem. I'm *trying* to find the
    breaking points/limits so I know where they are and can set the
    server thresholds below that. And ideally, if I can find out why
    things lock up instead of degrading, that would, I think, be useful.

In particular, after a lock up, I quickly shut down the ftp server software.
everything mellowed out. All the afs_cv_wait stuff went away.
I started it back up, and started testing again, but everything immediately
went into afs_cv_wait without the usual ramp-up period... as though there
were a resource leak of some sort. So I'm wondering if under heavy load,
rx packets are somehow not getting released back into the pool of available
stuff... or if the issue is that there is some other resource that is
encroaching on the same memory that rx is trying to use to allocate its
sendpackets.   I'm just guessing and thinking out loud for the most part,
but if I get some useful information that may turn into an enhancement
report at some point...   who knows.

Thanks for the help,

Derrick Brashear wrote:
> 1) get better backtraces (see the personal reply)
> 2) either you have more threads than that, or you have more serious problems...
>>   From what I can tell, RX wants to allocate a packet
>>to send data on the network, but there are none available,
>>so it decides to just hang out and wait, assuming that
>>a response will come back from the file server eventually,
>>somebody will free up a packet, and things will continue.
>>In the previous posts, I believe it was stated that packets
>>did come back from the file server, but perhaps were not
>>processed by the client. I'm wondering if there is some
>>sort of race condition where when things get busy, the
>>client can't process the returning packets, and everything
>>just deadlocks.