[OpenAFS] Re: Solaris 10 deadlock issue

Aaron Knister aaronk@umbc.edu
Thu, 7 Jul 2011 15:15:05 -0400


--90e6ba53b41e078a8704a77f8a80
Content-Type: text/plain; charset=ISO-8859-1

Hi Andrew,

The patch seems to work-- I can't hang reproduce the issue as described
initially.

I tested using the head of the openafs-stable-1_6_x branch as of
commit 2b2b647e3299c2dfeb30d2986290e1121d6cb5f3 with your patch applied.
Applying the patch to the 1.6.0pre6 caused the machine to kernel panic.

Thanks!

-Aaron

On Wed, Jun 29, 2011 at 9:51 PM, Aaron Knister <aaronk@umbc.edu> wrote:

> That's great Andrew, thank you! I'll try it out and report back.
>
>
> On Wed, Jun 29, 2011 at 4:16 PM, Andrew Deason <adeason@sinenomine.net>wrote:
>
>> On Tue, 14 Jun 2011 17:56:44 -0400
>> Aaron Knister <aaronk@umbc.edu> wrote:
>>
>> > Good afternoon!
>> >
>> > I'm writing to report a deadlock issue I'm seeing on Solaris 10.
>>
>> This issue should be fixed by this: <http://gerrit.openafs.org/4896>
>> which you can get the current version of in patch form here:
>> <
>> http://git.openafs.org/?p=openafs.git;a=commitdiff_plain;h=94483f566ff624a8d7fd7455359703b4525ec05a
>> >
>> (Comments on that are welcome, too, for anyone familiar with the Solaris
>> VM system)
>>
>> That should apply to a recent 1.6 and possibly 1.5. If it does in fact
>> cause the system to not hang, you can verify you're actually hitting the
>> problematic condition by running something like this:
>>
>> $ dtrace -n 'fbt::osi_VM_MultiPageConflict:return { @["conflict"] =
>> quantize(arg1); }'
>>
>> Run that before the copy, and after the copy completes, ctrl-C the
>> dtrace process and it should spit something like this out at you:
>>
>>  conflict
>>           value  ------------- Distribution ------------- count
>>              -1 |                                         0
>>               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 353
>>               1 |                                         0
>>
>> which shows that osi_VM_MultiPageConflict returned '0' 353 times. You
>> may get some 1 return values that show up:
>>
>>  conflict
>>           value  ------------- Distribution ------------- count
>>              -1 |                                         0
>>               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    344
>>               1 |@@@                                      31
>>               2 |                                         0
>>
>> But I could only get that to happen if I somewhat forced the client to
>> choose the "wrong" entry to evict from the cache. If all of the 'count's
>> are zero, you didn't trigger the condition that was causing the original
>> problem.
>>
>> Can you let us know if that fixes the problem for you, or changes
>> anything about it?
>>
>> --
>> Andrew Deason
>> adeason@sinenomine.net
>>
>> _______________________________________________
>> OpenAFS-info mailing list
>> OpenAFS-info@openafs.org
>> https://lists.openafs.org/mailman/listinfo/openafs-info
>>
>
>
>
> --
> Aaron Knister
> Systems Administrator
> Division of Information Technology
> University of Maryland, Baltimore County
> aaronk@umbc.edu
>



-- 
Aaron Knister
Systems Administrator
Division of Information Technology
University of Maryland, Baltimore County
aaronk@umbc.edu

--90e6ba53b41e078a8704a77f8a80
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Andrew,<div><br></div><div>The patch seems to work-- I can&#39;t hang re=
produce the issue as described initially.=A0</div><div><br></div><div>I tes=
ted using the head of the openafs-stable-1_6_x branch as of commit=A02b2b64=
7e3299c2dfeb30d2986290e1121d6cb5f3 with your patch applied. Applying the pa=
tch to the 1.6.0pre6 caused the machine to kernel panic.</div>

<div><br></div><div>Thanks!</div><div><br></div><div>-Aaron</div><div><br><=
div class=3D"gmail_quote">On Wed, Jun 29, 2011 at 9:51 PM, Aaron Knister <s=
pan dir=3D"ltr">&lt;<a href=3D"mailto:aaronk@umbc.edu">aaronk@umbc.edu</a>&=
gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;">That&#39;s great Andrew, thank you! I&#39;l=
l try it out and report back.<div><div></div><div class=3D"h5"><br><br><div=
 class=3D"gmail_quote">

On Wed, Jun 29, 2011 at 4:16 PM, Andrew Deason <span dir=3D"ltr">&lt;<a hre=
f=3D"mailto:adeason@sinenomine.net" target=3D"_blank">adeason@sinenomine.ne=
t</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">On Tue, 14 Jun 2011 17:56:44 -0400<br>
<div>Aaron Knister &lt;<a href=3D"mailto:aaronk@umbc.edu" target=3D"_blank"=
>aaronk@umbc.edu</a>&gt; wrote:<br>
<br>
</div><div>&gt; Good afternoon!<br>
&gt;<br>
&gt; I&#39;m writing to report a deadlock issue I&#39;m seeing on Solaris 1=
0.<br>
<br>
</div>This issue should be fixed by this: &lt;<a href=3D"http://gerrit.open=
afs.org/4896" target=3D"_blank">http://gerrit.openafs.org/4896</a>&gt;<br>
which you can get the current version of in patch form here:<br>
&lt;<a href=3D"http://git.openafs.org/?p=3Dopenafs.git;a=3Dcommitdiff_plain=
;h=3D94483f566ff624a8d7fd7455359703b4525ec05a" target=3D"_blank">http://git=
.openafs.org/?p=3Dopenafs.git;a=3Dcommitdiff_plain;h=3D94483f566ff624a8d7fd=
7455359703b4525ec05a</a>&gt;<br>



(Comments on that are welcome, too, for anyone familiar with the Solaris<br=
>
VM system)<br>
<br>
That should apply to a recent 1.6 and possibly 1.5. If it does in fact<br>
cause the system to not hang, you can verify you&#39;re actually hitting th=
e<br>
problematic condition by running something like this:<br>
<br>
$ dtrace -n &#39;fbt::osi_VM_MultiPageConflict:return { @[&quot;conflict&qu=
ot;] =3D quantize(arg1); }&#39;<br>
<br>
Run that before the copy, and after the copy completes, ctrl-C the<br>
dtrace process and it should spit something like this out at you:<br>
<br>
 =A0conflict<br>
 =A0 =A0 =A0 =A0 =A0 value =A0------------- Distribution ------------- coun=
t<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0-1 | =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 35=
3<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 1 | =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0<br>
<br>
which shows that osi_VM_MultiPageConflict returned &#39;0&#39; 353 times. Y=
ou<br>
may get some 1 return values that show up:<br>
<br>
 =A0conflict<br>
 =A0 =A0 =A0 =A0 =A0 value =A0------------- Distribution ------------- coun=
t<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0-1 | =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ =A0 =
=A0344<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 1 |@@@ =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A031<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 2 | =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0<br>
<br>
But I could only get that to happen if I somewhat forced the client to<br>
choose the &quot;wrong&quot; entry to evict from the cache. If all of the &=
#39;count&#39;s<br>
are zero, you didn&#39;t trigger the condition that was causing the origina=
l<br>
problem.<br>
<br>
Can you let us know if that fixes the problem for you, or changes<br>
anything about it?<br>
<font color=3D"#888888"><br>
--<br>
</font><div>Andrew Deason<br>
<a href=3D"mailto:adeason@sinenomine.net" target=3D"_blank">adeason@sinenom=
ine.net</a><br>
<br>
</div><div><div></div><div>_______________________________________________<=
br>
OpenAFS-info mailing list<br>
<a href=3D"mailto:OpenAFS-info@openafs.org" target=3D"_blank">OpenAFS-info@=
openafs.org</a><br>
<a href=3D"https://lists.openafs.org/mailman/listinfo/openafs-info" target=
=3D"_blank">https://lists.openafs.org/mailman/listinfo/openafs-info</a><br>
</div></div></blockquote></div><br><br clear=3D"all"><br></div></div><div><=
div></div><div class=3D"h5">-- <br>Aaron Knister<br>Systems Administrator<b=
r>Division of Information Technology<br>University of Maryland, Baltimore C=
ounty<br>

<a href=3D"mailto:aaronk@umbc.edu" target=3D"_blank">aaronk@umbc.edu</a><br=
>

</div></div></blockquote></div><br><br clear=3D"all"><br>-- <br>Aaron Knist=
er<br>Systems Administrator<br>Division of Information Technology<br>Univer=
sity of Maryland, Baltimore County<br><a href=3D"mailto:aaronk@umbc.edu" ta=
rget=3D"_blank">aaronk@umbc.edu</a><br>


</div>

--90e6ba53b41e078a8704a77f8a80--