[OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

Kodiak Firesmith kfiresmith@gmail.com
Fri, 2 Feb 2018 18:04:56 -0500


--f403045c22c4b671b2056442bdda
Content-Type: text/plain; charset="UTF-8"

Thanks Stephan,
I'm relatively new to handling OpenAFS.  Are these problems part of a
normal "kernel release; openafs update" cycle and perhaps I'm getting
snagged just by being too early of an adopter?  I wanted to raise the alarm
on this and see if anything else was needed from me as the reporter of the
issue, but perhaps that's an overreaction to what is just part of a normal
process I just haven't been tuned into in prior RHEL release cycles?

Should I try to get an account set up at http://rt.central.org and file a
bug?

Thanks!
 - Kodiak

On Fri, Feb 2, 2018 at 4:36 PM, Stephan Wiesand <stephan.wiesand@desy.de>
wrote:

> While additional data points are obviously most welcome, there is no
> expectation that this issue is fixed with 1.6.22.x or 1.8.x right now. Some
> serious work will be required to adapt OpenAFS to the changes in this
> kernel (series), though there's some hope that it won't be quite as hard to
> fix as the 7.4 getcwd issue.
>
> - Stephan
>
> > On 02.Feb 2018, at 22:20, Kodiak Firesmith <kfiresmith@gmail.com> wrote:
> >
> > Not much else to report today other than expanding my test base out to a
> few more RHEL 7.5b hosts, and re-rolled the 1.6.22.1-1 SRPM again, and am
> still seeing the same results universally.  Every host fails to boot due to
> a kernel panic when it tries to load the openafs DKMS kernel module.
> >
> > My next move on Monday will be to try an actual kernel-specific kmod
> instead of DKMS.  If that works I'll be kind of sad since we've had great
> luck with DKMS until now.
> >
> >  - Kodiak
> >
> > On Thu, Feb 1, 2018 at 3:26 PM, Kodiak Firesmith <kfiresmith@gmail.com>
> wrote:
> > I just rebuilt off-the-shelf RPMs based off of
> http://www.openafs.org/dl/openafs/1.6.22.1/openafs-1.6.22.1-1.src.rpm
> thinking maybe we had some historical patch in our build area that might be
> causing the problem, but alas, even the off-the-shelf RPMs cause a full
> wedge and reboot when openafs-client.service starts up.
> >
> >  - Kodiak
> >
> > On Thu, Feb 1, 2018 at 1:23 PM, Kodiak Firesmith <kfiresmith@gmail.com>
> wrote:
> > Hello Rich!
> > It's a Dell Optiplex 7020 with an Intel i7-4790.
> >
> > Thanks!
> >  - Kodiak
> >
> > On Thu, Feb 1, 2018 at 1:20 PM, Rich Sudlow <rich@nd.edu> wrote:
> > On 01/31/2018 09:43 AM, Kodiak Firesmith wrote:
> > https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3
> >
> > Greetings
> >
> > What processor..etc is this machine?
> >
> > Rich
> >
> >
> >
> >
> > On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith <kfiresmith@gmail.com
> <mailto:kfiresmith@gmail.com>> wrote:
> >
> >     Folks, re-sending this because the first try never hit the list -
> perhaps
> >     mail with attachments are silently dropped or held for manual
> moderation?     I'd originally attached an image of the stack trace.  I'll
> host it and reply
> >     to this with a  URL link in case that would also result in a drop or
> moderation.
> >
> >
> >
> >     Anyhow:
> >
> >     In testing the new RHEL 7.5 beta, we've discovered that hosts using
> AFS fail
> >     to boot after the upgrade, with Openafs 1.6.22.1 installed.
> >
> >     We are wondering if some of the non-guaranteed kernel ABIs that
> OpenAFS uses
> >     might have changed with the latest kernel provided in RHEL 7.
> >
> >     I've attached a picture of the trace.
> >
> >     Anyone else kicking the tires on the new RHEL yet?
> >
> >     Thanks!
>
>

--f403045c22c4b671b2056442bdda
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Thanks Stephan,<div>I&#39;m relatively new to handling Ope=
nAFS.=C2=A0 Are these problems part of a normal &quot;kernel release; opena=
fs update&quot; cycle and perhaps I&#39;m getting snagged just by being too=
 early of an adopter?=C2=A0 I wanted to raise the alarm on this and see if =
anything else was needed from me as the reporter of the issue, but perhaps =
that&#39;s an overreaction to what is just part of a normal process I just =
haven&#39;t been tuned into in prior RHEL release cycles?</div><div><br></d=
iv><div>Should I try to get an account set up at=C2=A0<a href=3D"http://rt.=
central.org">http://rt.central.org</a> and file a bug?=C2=A0=C2=A0</div><di=
v><br></div><div>Thanks!</div><div>=C2=A0- Kodiak</div></div><div class=3D"=
gmail_extra"><br><div class=3D"gmail_quote">On Fri, Feb 2, 2018 at 4:36 PM,=
 Stephan Wiesand <span dir=3D"ltr">&lt;<a href=3D"mailto:stephan.wiesand@de=
sy.de" target=3D"_blank">stephan.wiesand@desy.de</a>&gt;</span> wrote:<br><=
blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px=
 #ccc solid;padding-left:1ex">While additional data points are obviously mo=
st welcome, there is no expectation that this issue is fixed with 1.6.22.x =
or 1.8.x right now. Some serious work will be required to adapt OpenAFS to =
the changes in this kernel (series), though there&#39;s some hope that it w=
on&#39;t be quite as hard to fix as the 7.4 getcwd issue.<br>
<span class=3D"HOEnZb"><font color=3D"#888888"><br>
- Stephan<br>
</font></span><div class=3D"HOEnZb"><div class=3D"h5"><br>
&gt; On 02.Feb 2018, at 22:20, Kodiak Firesmith &lt;<a href=3D"mailto:kfire=
smith@gmail.com">kfiresmith@gmail.com</a>&gt; wrote:<br>
&gt;<br>
&gt; Not much else to report today other than expanding my test base out to=
 a few more RHEL 7.5b hosts, and re-rolled the 1.6.22.1-1 SRPM again, and a=
m still seeing the same results universally.=C2=A0 Every host fails to boot=
 due to a kernel panic when it tries to load the openafs DKMS kernel module=
.<br>
&gt;<br>
&gt; My next move on Monday will be to try an actual kernel-specific kmod i=
nstead of DKMS.=C2=A0 If that works I&#39;ll be kind of sad since we&#39;ve=
 had great luck with DKMS until now.<br>
&gt;<br>
&gt;=C2=A0 - Kodiak<br>
&gt;<br>
&gt; On Thu, Feb 1, 2018 at 3:26 PM, Kodiak Firesmith &lt;<a href=3D"mailto=
:kfiresmith@gmail.com">kfiresmith@gmail.com</a>&gt; wrote:<br>
&gt; I just rebuilt off-the-shelf RPMs based off of <a href=3D"http://www.o=
penafs.org/dl/openafs/1.6.22.1/openafs-1.6.22.1-1.src.rpm" rel=3D"noreferre=
r" target=3D"_blank">http://www.openafs.org/dl/<wbr>openafs/1.6.22.1/openaf=
s-1.6.<wbr>22.1-1.src.rpm</a> thinking maybe we had some historical patch i=
n our build area that might be causing the problem, but alas, even the off-=
the-shelf RPMs cause a full wedge and reboot when openafs-client.service st=
arts up.<br>
&gt;<br>
&gt;=C2=A0 - Kodiak<br>
&gt;<br>
&gt; On Thu, Feb 1, 2018 at 1:23 PM, Kodiak Firesmith &lt;<a href=3D"mailto=
:kfiresmith@gmail.com">kfiresmith@gmail.com</a>&gt; wrote:<br>
&gt; Hello Rich!<br>
&gt; It&#39;s a Dell Optiplex 7020 with an Intel i7-4790.<br>
&gt;<br>
&gt; Thanks!<br>
&gt;=C2=A0 - Kodiak<br>
&gt;<br>
&gt; On Thu, Feb 1, 2018 at 1:20 PM, Rich Sudlow &lt;<a href=3D"mailto:rich=
@nd.edu">rich@nd.edu</a>&gt; wrote:<br>
&gt; On 01/31/2018 09:43 AM, Kodiak Firesmith wrote:<br>
&gt; <a href=3D"https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3" rel=3D"norefer=
rer" target=3D"_blank">https://photos.app.goo.gl/<wbr>WgPsSUCLK5ojxIuH3</a>=
<br>
&gt;<br>
&gt; Greetings<br>
&gt;<br>
&gt; What processor..etc is this machine?<br>
&gt;<br>
&gt; Rich<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith &lt;<a href=3D"mailt=
o:kfiresmith@gmail.com">kfiresmith@gmail.com</a> &lt;mailto:<a href=3D"mail=
to:kfiresmith@gmail.com">kfiresmith@gmail.com</a>&gt;&gt; wrote:<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0Folks, re-sending this because the first try never =
hit the list - perhaps<br>
&gt;=C2=A0 =C2=A0 =C2=A0mail with attachments are silently dropped or held =
for manual moderation?=C2=A0 =C2=A0 =C2=A0I&#39;d originally attached an im=
age of the stack trace.=C2=A0 I&#39;ll host it and reply<br>
&gt;=C2=A0 =C2=A0 =C2=A0to this with a=C2=A0 URL link in case that would al=
so result in a drop or moderation.<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0Anyhow:<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0In testing the new RHEL 7.5 beta, we&#39;ve discove=
red that hosts using AFS fail<br>
&gt;=C2=A0 =C2=A0 =C2=A0to boot after the upgrade, with Openafs 1.6.22.1 in=
stalled.<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0We are wondering if some of the non-guaranteed kern=
el ABIs that OpenAFS uses<br>
&gt;=C2=A0 =C2=A0 =C2=A0might have changed with the latest kernel provided =
in RHEL 7.<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0I&#39;ve attached a picture of the trace.<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0Anyone else kicking the tires on the new RHEL yet?<=
br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0Thanks!<br>
<br>
</div></div></blockquote></div><br></div>

--f403045c22c4b671b2056442bdda--