[OpenAFS] Re: Solaris 10 deadlock issue

Andrew Deason adeason@sinenomine.net
Wed, 29 Jun 2011 15:16:12 -0500

On Tue, 14 Jun 2011 17:56:44 -0400
Aaron Knister <aaronk@umbc.edu> wrote:

> Good afternoon!
> I'm writing to report a deadlock issue I'm seeing on Solaris 10.

This issue should be fixed by this: <http://gerrit.openafs.org/4896>
which you can get the current version of in patch form here:
(Comments on that are welcome, too, for anyone familiar with the Solaris
VM system)

That should apply to a recent 1.6 and possibly 1.5. If it does in fact
cause the system to not hang, you can verify you're actually hitting the
problematic condition by running something like this:

$ dtrace -n 'fbt::osi_VM_MultiPageConflict:return { @["conflict"] = quantize(arg1); }'

Run that before the copy, and after the copy completes, ctrl-C the
dtrace process and it should spit something like this out at you:

           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 353
               1 |                                         0

which shows that osi_VM_MultiPageConflict returned '0' 353 times. You
may get some 1 return values that show up:

           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    344
               1 |@@@                                      31
               2 |                                         0

But I could only get that to happen if I somewhat forced the client to
choose the "wrong" entry to evict from the cache. If all of the 'count's
are zero, you didn't trigger the condition that was causing the original

Can you let us know if that fixes the problem for you, or changes
anything about it?

Andrew Deason