[OpenAFS] Re: Solaris 10 deadlock issue

Andrew Deason adeason@sinenomine.net
Wed, 15 Jun 2011 23:18:46 -0500


On Tue, 14 Jun 2011 22:39:55 -0500
Andrew Deason <adeason@sinenomine.net> wrote:

> echo "::walk thread | ::findstack" | mdb -k unix.N vmcore.N > foo.out
> 
> Then ideally edit foo.out and remove anything that doesn't mention
> "afs" in the stack trace. But if this is as easily reproducible as it
> looks, then we can probably get our own soon enough.

I talked about this a bit at the Workshop today, but so it's here... I
do have this replicated locally, and I sorta know what the problem is.
Something has changed with out mmaped data is retrieved, and our local
bookkeeping on dcache entries is as a result preventing us from kicking
out the dcache entries for the file you're reading when the cache gets
too full.

I haven't had time to look at why we avoid kicking out dcache entries
like this yet, but I think we have enough data to know what's going on.
I'd like to reproduce this on an old opensolaris VM I have, to see if it
happens on an OS for which I can look at the source.

-- 
Andrew Deason
adeason@sinenomine.net