[OpenAFS] Solaris 10 deadlock issue

Aaron Knister aaronk@umbc.edu
Tue, 14 Jun 2011 19:09:35 -0400


--20cf3071d0224deff204a5b422ab
Content-Type: text/plain; charset=ISO-8859-1

I'm working on figuring out how to do that. I've got a crash dump now I just
need to figure out how to get the backtrace.

On Tue, Jun 14, 2011 at 6:17 PM, Derrick Brashear <shadow@gmail.com> wrote:

> the backtrace from a kernel dump would be far more useful, if you have
> a way to collect one.
>
> On Tue, Jun 14, 2011 at 5:56 PM, Aaron Knister <aaronk@umbc.edu> wrote:
> > Good afternoon!
> > I'm writing to report a deadlock issue I'm seeing on Solaris 10.
> > What I've observed is that when a file larger than the configured size of
> > the cache is copied out of AFS the cache manager deadlocks and all access
> to
> > /afs on the affected system hangs until the system is rebooted. The issue
> > occurs with a memory cache as well as a disk cache.
> >
> > The issue can be mitigated if the cache size is raised to the value of
> > roughly half of the physical memory in the given system. The issue
> appeared
> > somewhere between Solaris 10 "u8" and "u9."
> > I've reproduced the problem using OpenAFS 1.4.14.1, 1.5.78 and 1.6.0pre6
> and
> > a Solaris 10 "u8" system with all of the latest patches applied.
> > I've put together a tar file containing:
> > - An fstrace dump starting a few seconds before I initiated the copy
> > - A stack trace of the hung cp command
> > - The output of cmdebug -long -server localhost run after AFS hangs
> > The individual files as well as a tar file of them can be found here:
> > http://userpages.umbc.edu/~aaronk/afs/solaris10-deadlock-issue.
> > Any help would be greatly appreciated.
> > Best,
> > Aaron
> > --
> > Aaron Knister
> > Systems Administrator
> > Division of Information Technology
> > University of Maryland, Baltimore County
> > aaronk@umbc.edu
> >
>
>
>
> --
> Derrick
>



-- 
Aaron Knister
Systems Administrator
Division of Information Technology
University of Maryland, Baltimore County
aaronk@umbc.edu

--20cf3071d0224deff204a5b422ab
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I&#39;m working on figuring out how to do that. I&#39;ve got a crash dump n=
ow I just need to figure out how to get the backtrace.<br><br><div class=3D=
"gmail_quote">On Tue, Jun 14, 2011 at 6:17 PM, Derrick Brashear <span dir=
=3D"ltr">&lt;<a href=3D"mailto:shadow@gmail.com">shadow@gmail.com</a>&gt;</=
span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;">the backtrace from a kernel dump would be f=
ar more useful, if you have<br>
a way to collect one.<br>
<div><div></div><div class=3D"h5"><br>
On Tue, Jun 14, 2011 at 5:56 PM, Aaron Knister &lt;<a href=3D"mailto:aaronk=
@umbc.edu">aaronk@umbc.edu</a>&gt; wrote:<br>
&gt; Good afternoon!<br>
&gt; I&#39;m writing to report a deadlock issue I&#39;m seeing on Solaris 1=
0.<br>
&gt; What I&#39;ve observed is that when a file larger than the configured =
size of<br>
&gt; the cache is copied out of AFS the cache manager deadlocks and all acc=
ess to<br>
&gt; /afs on the affected system hangs until the system is rebooted. The is=
sue<br>
&gt; occurs with a memory cache as well as a disk cache.<br>
&gt;<br>
&gt; The issue can be mitigated if the cache size is raised to the value of=
<br>
&gt; roughly half of the physical memory in the given system. The issue app=
eared<br>
&gt; somewhere between=A0Solaris=A010 &quot;u8&quot; and &quot;u9.&quot;<br=
>
&gt; I&#39;ve reproduced the problem using OpenAFS 1.4.14.1, 1.5.78 and 1.6=
.0pre6 and<br>
&gt; a Solaris 10 &quot;u8&quot; system with all of the latest patches appl=
ied.<br>
&gt; I&#39;ve put together a tar file containing:<br>
&gt; - An fstrace dump starting a few seconds before I initiated the copy<b=
r>
&gt; - A stack trace of the hung cp command<br>
&gt; - The output of cmdebug -long -server localhost run after AFS hangs<br=
>
&gt; The individual files as well as a tar file of them can be found here:<=
br>
&gt; <a href=3D"http://userpages.umbc.edu/~aaronk/afs/solaris10-deadlock-is=
sue" target=3D"_blank">http://userpages.umbc.edu/~aaronk/afs/solaris10-dead=
lock-issue</a>.<br>
&gt; Any help would be greatly appreciated.<br>
&gt; Best,<br>
&gt; Aaron<br>
&gt; --<br>
&gt; Aaron Knister<br>
&gt; Systems Administrator<br>
&gt; Division of Information Technology<br>
&gt; University of Maryland, Baltimore County<br>
&gt; <a href=3D"mailto:aaronk@umbc.edu">aaronk@umbc.edu</a><br>
&gt;<br>
<br>
<br>
<br>
</div></div>--<br>
<font color=3D"#888888">Derrick<br>
</font></blockquote></div><br><br clear=3D"all"><br>-- <br>Aaron Knister<br=
>Systems Administrator<br>Division of Information Technology<br>University =
of Maryland, Baltimore County<br><a href=3D"mailto:aaronk@umbc.edu" target=
=3D"_blank">aaronk@umbc.edu</a><br>



--20cf3071d0224deff204a5b422ab--