[OpenAFS] Re: Solaris 10 deadlock issue

Aaron Knister aaronk@umbc.edu
Mon, 20 Jun 2011 18:12:00 -0400


--90e6ba4fc6e86dffbd04a62c0787
Content-Type: text/plain; charset=ISO-8859-1

It sounds like the problem might go back as far or farther than 118855-36.
I've tried with a stock u3 box (118855-33) and OpenAFS 1.4.4 and it
experiences the hang as originally reported. Same stack trace too on the cp
command.

On Fri, Jun 17, 2011 at 3:25 PM, <omalleys@msu.edu> wrote:

> Quoting Andrew Deason <adeason@sinenomine.net>:
>
>  On Fri, 17 Jun 2011 13:21:59 -0400 (EDT)
>> Andy Cobaugh <phalenor@gmail.com> wrote:
>>
>>  Can someone summarise which kernel versions / solaris updates and
>>> openafs versions are affected?
>>>
>>
>> Someone at the conference mentioned some specific patch levels... I
>> think they're in Tom Kula's notes. But those were guesses, I believe; we
>> haven't gone through all of the patches and seen where it starts
>> occurring.
>>
> 118855-36 and openafs 1.4.10 is where I started to see issues. However, a
> coworker was also screwing around on the systems and another was remotely
> putting them to sleep.
>
> I also saw issues on Solaris 8/sparc with 1.4.12 and ended up reverting to
> 1.4.8 which was the previous stable and I didn't see these issues.
>
> I was seeing a variety of other issues, where the afs access wasn't locking
> up, but inetd was locked up like it was attacked (and it may have been),
> however this happened on solaris containers, and it only affected the single
> container instance not the host where the client actually lives.
>
> Off the top of my head, some of the kernel changes affected the security
> policy (unable to even disable it), the name services (dns/username lookups)
> and the built-in kerberos mechanism.
>
> Im actually kind of wondering if there is a memory leak in there somewhere.
>
>
>
>
>
> ______________________________**_________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/**mailman/listinfo/openafs-info<https://lists.openafs.org/mailman/listinfo/openafs-info>
>



-- 
Aaron Knister
Systems Administrator
Division of Information Technology
University of Maryland, Baltimore County
aaronk@umbc.edu

--90e6ba4fc6e86dffbd04a62c0787
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

It sounds like the problem might go back as far or farther than 118855-36. =
I&#39;ve tried with a stock u3 box (118855-33) and OpenAFS 1.4.4=A0and it e=
xperiences the hang as originally reported. Same stack trace too on the cp =
command.<br>


<br><div class=3D"gmail_quote">On Fri, Jun 17, 2011 at 3:25 PM,  <span dir=
=3D"ltr">&lt;<a href=3D"mailto:omalleys@msu.edu" target=3D"_blank">omalleys=
@msu.edu</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div>Quoting Andrew Deason &lt;<a href=3D"mailto:adeason@sinenomine.net" ta=
rget=3D"_blank">adeason@sinenomine.net</a>&gt;:<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
On Fri, 17 Jun 2011 13:21:59 -0400 (EDT)<br>
Andy Cobaugh &lt;<a href=3D"mailto:phalenor@gmail.com" target=3D"_blank">ph=
alenor@gmail.com</a>&gt; wrote:<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Can someone summarise which kernel versions / solaris updates and<br>
openafs versions are affected?<br>
</blockquote>
<br>
Someone at the conference mentioned some specific patch levels... I<br>
think they&#39;re in Tom Kula&#39;s notes. But those were guesses, I believ=
e; we<br>
haven&#39;t gone through all of the patches and seen where it starts<br>
occurring.<br>
</blockquote></div>
118855-36 and openafs 1.4.10 is where I started to see issues. However, a c=
oworker was also screwing around on the systems and another was remotely pu=
tting them to sleep.<br>
<br>
I also saw issues on Solaris 8/sparc with 1.4.12 and ended up reverting to =
1.4.8 which was the previous stable and I didn&#39;t see these issues.<br>
<br>
I was seeing a variety of other issues, where the afs access wasn&#39;t loc=
king up, but inetd was locked up like it was attacked (and it may have been=
), however this happened on solaris containers, and it only affected the si=
ngle container instance not the host where the client actually lives.<br>



<br>
Off the top of my head, some of the kernel changes affected the security po=
licy (unable to even disable it), the name services (dns/username lookups) =
and the built-in kerberos mechanism.<br>
<br>
Im actually kind of wondering if there is a memory leak in there somewhere.=
<div><div></div><div><br>
<br>
<br>
<br>
<br>
______________________________<u></u>_________________<br>
OpenAFS-info mailing list<br>
<a href=3D"mailto:OpenAFS-info@openafs.org" target=3D"_blank">OpenAFS-info@=
openafs.org</a><br>
<a href=3D"https://lists.openafs.org/mailman/listinfo/openafs-info" target=
=3D"_blank">https://lists.openafs.org/<u></u>mailman/listinfo/openafs-info<=
/a><br>
</div></div></blockquote></div><br><br clear=3D"all"><br>-- <br>Aaron Knist=
er<br>Systems Administrator<br>Division of Information Technology<br>Univer=
sity of Maryland, Baltimore County<br><a href=3D"mailto:aaronk@umbc.edu" ta=
rget=3D"_blank">aaronk@umbc.edu</a><br>




--90e6ba4fc6e86dffbd04a62c0787--