[OpenAFS-port-freebsd] Client deadlock?
Benjamin Kaduk
kaduk@MIT.EDU
Wed, 30 Mar 2011 21:18:05 -0400 (EDT)
On Wed, 30 Mar 2011, Garrett Wollman wrote:
> <<On Wed, 30 Mar 2011 17:42:52 -0400 (EDT), Benjamin Kaduk <kaduk@MIT.EDU> said:
>
>> A couple quick checks before really digging in: how
>> big is the cache, and how much data is bonnie++ trying to slug around?
>
> cacheinfo:
>
> /afs:/var/cache/openafs:1500000
cmdebug -cache would be more authoritative (the last time I tripped the
"small cache" behavior was when I was passing enough arguments to afsd
that the memcache size was determined by them and the cachinfo
specification was ignored).
>
> (of course, it's really using memcache). I believe bonnie++ when left
> to its own devices uses twice the system memory, so that would be 24
> GB. But I'm not sure how far it's actually getting; on the server
> side, Bonnie's temporary file appears to be zero-length.
>
It sure sounds like things are hanging up very quickly.
Since you don't have this machine for long, I will probably end up seeing
if bonnie++ will reproduce for me.
>> I also assume you don't have inaccurate entries for the realm in question
>> in your CellServDB, but active confirmation is good.
>
> On the same file server I've run postmark and a simple file
> creation/deletion microbenchmark multiple times with no issues (other
> than the abysmally slow read performance which seems to be common to
> all clients). postmark loads up the server quite nicely, so I'm
> puzzled as to what bonnie++ is doing that kills it. I haven't tried
> doing something trivial like "dd".
dd with a range of blocksizes might be interesting in its own right, but
is probably not going to help track down the bug here.>
If it is a proper deadlock, it's probably easiest to have the kernel
debugger show which locks are held, and then examine the dump to see which
threads are sleeping where and on what. I have a decent setup for doing
this, when I have time ...
-Ben Kaduk