[OpenAFS-port-freebsd] Client deadlock?

Benjamin Kaduk kaduk@MIT.EDU
Wed, 30 Mar 2011 21:18:05 -0400 (EDT)


On Wed, 30 Mar 2011, Garrett Wollman wrote:

> <<On Wed, 30 Mar 2011 17:42:52 -0400 (EDT), Benjamin Kaduk <kaduk@MIT.EDU> said:
>
>> A couple quick checks before really digging in: how
>> big is the cache, and how much data is bonnie++ trying to slug around?
>
> cacheinfo:
>
> /afs:/var/cache/openafs:1500000

cmdebug -cache would be more authoritative (the last time I tripped the 
"small cache" behavior was when I was passing enough arguments to afsd 
that the memcache size was determined by them and the cachinfo 
specification was ignored).

>
> (of course, it's really using memcache).  I believe bonnie++ when left
> to its own devices uses twice the system memory, so that would be 24
> GB.  But I'm not sure how far it's actually getting; on the server
> side, Bonnie's temporary file appears to be zero-length.
>

It sure sounds like things are hanging up very quickly.
Since you don't have this machine for long, I will probably end up seeing 
if bonnie++ will reproduce for me.

>> I also assume you don't have inaccurate entries for the realm in question
>> in your CellServDB, but active confirmation is good.
>
> On the same file server I've run postmark and a simple file
> creation/deletion microbenchmark multiple times with no issues (other
> than the abysmally slow read performance which seems to be common to
> all clients).  postmark loads up the server quite nicely, so I'm
> puzzled as to what bonnie++ is doing that kills it.  I haven't tried
> doing something trivial like "dd".

dd with a range of blocksizes might be interesting in its own right, but 
is probably not going to help track down the bug here.>

If it is a proper deadlock, it's probably easiest to have the kernel 
debugger show which locks are held, and then examine the dump to see which 
threads are sleeping where and on what.  I have a decent setup for doing 
this, when I have time ...

-Ben Kaduk