[OpenAFS] Solaris 10 deadlock issue
Aaron Knister
aaronk@umbc.edu
Tue, 14 Jun 2011 19:09:35 -0400
--20cf3071d0224deff204a5b422ab
Content-Type: text/plain; charset=ISO-8859-1
I'm working on figuring out how to do that. I've got a crash dump now I just
need to figure out how to get the backtrace.
On Tue, Jun 14, 2011 at 6:17 PM, Derrick Brashear <shadow@gmail.com> wrote:
> the backtrace from a kernel dump would be far more useful, if you have
> a way to collect one.
>
> On Tue, Jun 14, 2011 at 5:56 PM, Aaron Knister <aaronk@umbc.edu> wrote:
> > Good afternoon!
> > I'm writing to report a deadlock issue I'm seeing on Solaris 10.
> > What I've observed is that when a file larger than the configured size of
> > the cache is copied out of AFS the cache manager deadlocks and all access
> to
> > /afs on the affected system hangs until the system is rebooted. The issue
> > occurs with a memory cache as well as a disk cache.
> >
> > The issue can be mitigated if the cache size is raised to the value of
> > roughly half of the physical memory in the given system. The issue
> appeared
> > somewhere between Solaris 10 "u8" and "u9."
> > I've reproduced the problem using OpenAFS 1.4.14.1, 1.5.78 and 1.6.0pre6
> and
> > a Solaris 10 "u8" system with all of the latest patches applied.
> > I've put together a tar file containing:
> > - An fstrace dump starting a few seconds before I initiated the copy
> > - A stack trace of the hung cp command
> > - The output of cmdebug -long -server localhost run after AFS hangs
> > The individual files as well as a tar file of them can be found here:
> > http://userpages.umbc.edu/~aaronk/afs/solaris10-deadlock-issue.
> > Any help would be greatly appreciated.
> > Best,
> > Aaron
> > --
> > Aaron Knister
> > Systems Administrator
> > Division of Information Technology
> > University of Maryland, Baltimore County
> > aaronk@umbc.edu
> >
>
>
>
> --
> Derrick
>
--
Aaron Knister
Systems Administrator
Division of Information Technology
University of Maryland, Baltimore County
aaronk@umbc.edu
--20cf3071d0224deff204a5b422ab
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
I'm working on figuring out how to do that. I've got a crash dump n=
ow I just need to figure out how to get the backtrace.<br><br><div class=3D=
"gmail_quote">On Tue, Jun 14, 2011 at 6:17 PM, Derrick Brashear <span dir=
=3D"ltr"><<a href=3D"mailto:shadow@gmail.com">shadow@gmail.com</a>></=
span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;">the backtrace from a kernel dump would be f=
ar more useful, if you have<br>
a way to collect one.<br>
<div><div></div><div class=3D"h5"><br>
On Tue, Jun 14, 2011 at 5:56 PM, Aaron Knister <<a href=3D"mailto:aaronk=
@umbc.edu">aaronk@umbc.edu</a>> wrote:<br>
> Good afternoon!<br>
> I'm writing to report a deadlock issue I'm seeing on Solaris 1=
0.<br>
> What I've observed is that when a file larger than the configured =
size of<br>
> the cache is copied out of AFS the cache manager deadlocks and all acc=
ess to<br>
> /afs on the affected system hangs until the system is rebooted. The is=
sue<br>
> occurs with a memory cache as well as a disk cache.<br>
><br>
> The issue can be mitigated if the cache size is raised to the value of=
<br>
> roughly half of the physical memory in the given system. The issue app=
eared<br>
> somewhere between=A0Solaris=A010 "u8" and "u9."<br=
>
> I've reproduced the problem using OpenAFS 1.4.14.1, 1.5.78 and 1.6=
.0pre6 and<br>
> a Solaris 10 "u8" system with all of the latest patches appl=
ied.<br>
> I've put together a tar file containing:<br>
> - An fstrace dump starting a few seconds before I initiated the copy<b=
r>
> - A stack trace of the hung cp command<br>
> - The output of cmdebug -long -server localhost run after AFS hangs<br=
>
> The individual files as well as a tar file of them can be found here:<=
br>
> <a href=3D"http://userpages.umbc.edu/~aaronk/afs/solaris10-deadlock-is=
sue" target=3D"_blank">http://userpages.umbc.edu/~aaronk/afs/solaris10-dead=
lock-issue</a>.<br>
> Any help would be greatly appreciated.<br>
> Best,<br>
> Aaron<br>
> --<br>
> Aaron Knister<br>
> Systems Administrator<br>
> Division of Information Technology<br>
> University of Maryland, Baltimore County<br>
> <a href=3D"mailto:aaronk@umbc.edu">aaronk@umbc.edu</a><br>
><br>
<br>
<br>
<br>
</div></div>--<br>
<font color=3D"#888888">Derrick<br>
</font></blockquote></div><br><br clear=3D"all"><br>-- <br>Aaron Knister<br=
>Systems Administrator<br>Division of Information Technology<br>University =
of Maryland, Baltimore County<br><a href=3D"mailto:aaronk@umbc.edu" target=
=3D"_blank">aaronk@umbc.edu</a><br>
--20cf3071d0224deff204a5b422ab--