[OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

Kodiak Firesmith kfiresmith@gmail.com
Wed, 7 Feb 2018 11:46:28 -0500


--f40304398e6c6e738b0564a20990
Content-Type: text/plain; charset="UTF-8"

Hello again All,

As part of continued testing, I've been able to confirm that the SystemD
double-service startup thing only happens to my hosts when going from RHEL
7.4 to RHEL 7.5beta.  On a test host installed directly as RHEL 7.5beta, I
get a bit farther with 1.6.18.22, in that I get to the point where OpenAFS
"kind of" works.

What I'm observing is that the openafs client Kernel module (built by DKMS)
loads fine, and just so long as you know where you need to go in /afs, you
can get there, and you can read and write files and the OpenAFS 'fs'
command works.  But doing an 'ls' of /afs or any path underneath results in
"ls: reading directory /afs/: Not a directory".

I ran an strace of a good RHEL 7.4 host running ls on /afs, and a RHEL
7.5beta host running ls on /afs and have created pastebins of both, as well
as an inline diff.

All can be seen at the following locations:

works
https://paste.fedoraproject.org/paste/Hiojt2~Be3wgez47bKNucQ

fails
https://paste.fedoraproject.org/paste/13ZXBfJIOMsuEJFwFShBfg


diff
https://paste.fedoraproject.org/paste/FJKRwep1fWJogIDbLnkn8A

Hopefully this might help the OpenAFS devs, or someone might know what
might be borking on every RHEL 7.5 beta host.  It does fit with what other
7.5 beta users have observed OpenAFS doing.

Thanks!
 - Kodiak

On Mon, Feb 5, 2018 at 12:31 PM, Stephan Wiesand <stephan.wiesand@desy.de>
wrote:

>
> > On 04.Feb 2018, at 02:11, Jeffrey Altman <jaltman@auristor.com> wrote:
> >
> > On 2/2/2018 6:04 PM, Kodiak Firesmith wrote:
> >> I'm relatively new to handling OpenAFS.  Are these problems part of a
> >> normal "kernel release; openafs update" cycle and perhaps I'm getting
> >> snagged just by being too early of an adopter?  I wanted to raise the
> >> alarm on this and see if anything else was needed from me as the
> >> reporter of the issue, but perhaps that's an overreaction to what is
> >> just part of a normal process I just haven't been tuned into in prior
> >> RHEL release cycles?
> >
> >
> > Kodiak,
> >
> > On RHEL, DKMS is safe to use for kernel modules that restrict themselves
> > to using the restricted set of kernel interfaces (the RHEL KABI) that
> > Red Hat has designated will be supported across the lifespan of the RHEL
> > major version number.  OpenAFS is not such a kernel module.  As a result
> > it is vulnerable to breakage each and every time a new kernel is shipped.
>
> Jeffrey,
>
> the usual way to use DKMS is to either have it build a module for a newly
> installed kernel or install a prebuilt module for that kernel. It may be
> possible to abuse it for providing a module built for another kernel, but
> I think that won't happen accidentally.
>
> You may be confusing DKMS with RHEL's "KABI tracking kmods". Those should
> be safe to use within a RHEL minor release (and the SL packaging has been
> using them like this since EL6.4), but aren't across minor releases (and
> that's why the SL packaging modifies the kmod handling to require a build
> for the minor release in question.
>
> > There are two types of failures that can occur:
> >
> > 1. a change results in failure to build the OpenAFS kernel module
> >    for the new kernel
> >
> > 2. a change results in the OpenAFS kernel module building and
> >    successfully loading but failing to operate correctly
>
> The latter shouldn't happen within a minor release, but can across
> minor releases.
>
> > It is the second of these possibilities that has taken place with the
> > release of the 3.10.0-830.el7 kernel shipped as part of the RHEL 7.5
> beta.
> >
> > Are you an early adopter of RHEL 7.5 beta?  Absolutely, its a beta
> > release and as such you should expect that there will be bugs and that
> > third party kernel modules that do not adhere to the KABI functionality
> > might have compatibility issues.
>
> The -830 kernel can break 3rd-party modules using non-whitelisted ABIs,
> whether or not they adhere to the "KABI functionality".
>
> > There was a compatibility issue with RHEL 7.4 kernel
> > (3.10.0_693.1.1.el7) as well that was only fixed in the OpenAFS 1.6
> > release series this past week as part of 1.6.22.2:
> >
> >  http://www.openafs.org/dl/openafs/1.6.22.2/RELNOTES-1.6.22.2
>
> Yes, and this one was hard to fix. Thanks are due to Mark Vitale for
> developing the fix and all those who reviewed and tested it.
>
> > Jeffrey Altman
> > AuriStor, Inc.
> >
> > P.S. - Welcome to the community.
>
> Seconded. In particular, the problem report regarding the EL7.5beta
> kernel was absolutely appropriate.
>
> --
> Stephan Wiesand
> DESY - DV -
> Platanenallee 6
> 15738 Zeuthen, Germany
>
>
>

--f40304398e6c6e738b0564a20990
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hello again All,<div><br></div><div>As part of continued t=
esting, I&#39;ve been able to confirm that the SystemD double-service start=
up thing only happens to my hosts when going from RHEL 7.4 to RHEL 7.5beta.=
=C2=A0 On a test host installed directly as RHEL 7.5beta, I get a bit farth=
er with 1.6.18.22, in that I get to the point where OpenAFS &quot;kind of&q=
uot; works.</div><div><br></div><div>What I&#39;m observing is that the ope=
nafs client Kernel module (built by DKMS) loads fine, and just so long as y=
ou know where you need to go in /afs, you can get there, and you can read a=
nd write files and the OpenAFS &#39;fs&#39; command works.=C2=A0 But doing =
an &#39;ls&#39; of /afs or any path underneath results in &quot;ls: reading=
 directory /afs/: Not a directory&quot;.</div><div><br></div><div>I ran an =
strace of a good RHEL 7.4 host running ls on /afs, and a RHEL 7.5beta host =
running ls on /afs and have created pastebins of both, as well as an inline=
 diff.</div><div><br></div><div>All can be seen at the following locations:=
</div><div><br></div><div><div>works</div><div><a href=3D"https://paste.fed=
oraproject.org/paste/Hiojt2~Be3wgez47bKNucQ">https://paste.fedoraproject.or=
g/paste/Hiojt2~Be3wgez47bKNucQ</a></div><div><br></div><div>fails</div><div=
><a href=3D"https://paste.fedoraproject.org/paste/13ZXBfJIOMsuEJFwFShBfg">h=
ttps://paste.fedoraproject.org/paste/13ZXBfJIOMsuEJFwFShBfg</a></div><div><=
br></div><div><br></div><div>diff</div><div><a href=3D"https://paste.fedora=
project.org/paste/FJKRwep1fWJogIDbLnkn8A">https://paste.fedoraproject.org/p=
aste/FJKRwep1fWJogIDbLnkn8A</a></div></div><div><br></div><div>Hopefully th=
is might help the OpenAFS devs, or someone might know what might be borking=
 on every RHEL 7.5 beta host.=C2=A0 It does fit with what other 7.5 beta us=
ers have observed OpenAFS doing.=C2=A0=C2=A0</div><div><br></div><div>Thank=
s!</div><div>=C2=A0- Kodiak=C2=A0</div></div><div class=3D"gmail_extra"><br=
><div class=3D"gmail_quote">On Mon, Feb 5, 2018 at 12:31 PM, Stephan Wiesan=
d <span dir=3D"ltr">&lt;<a href=3D"mailto:stephan.wiesand@desy.de" target=
=3D"_blank">stephan.wiesand@desy.de</a>&gt;</span> wrote:<br><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;p=
adding-left:1ex"><span class=3D""><br>
&gt; On 04.Feb 2018, at 02:11, Jeffrey Altman &lt;<a href=3D"mailto:jaltman=
@auristor.com">jaltman@auristor.com</a>&gt; wrote:<br>
&gt;<br>
&gt; On 2/2/2018 6:04 PM, Kodiak Firesmith wrote:<br>
&gt;&gt; I&#39;m relatively new to handling OpenAFS.=C2=A0 Are these proble=
ms part of a<br>
&gt;&gt; normal &quot;kernel release; openafs update&quot; cycle and perhap=
s I&#39;m getting<br>
&gt;&gt; snagged just by being too early of an adopter?=C2=A0 I wanted to r=
aise the<br>
&gt;&gt; alarm on this and see if anything else was needed from me as the<b=
r>
&gt;&gt; reporter of the issue, but perhaps that&#39;s an overreaction to w=
hat is<br>
&gt;&gt; just part of a normal process I just haven&#39;t been tuned into i=
n prior<br>
&gt;&gt; RHEL release cycles?<br>
&gt;<br>
&gt;<br>
&gt; Kodiak,<br>
&gt;<br>
&gt; On RHEL, DKMS is safe to use for kernel modules that restrict themselv=
es<br>
&gt; to using the restricted set of kernel interfaces (the RHEL KABI) that<=
br>
&gt; Red Hat has designated will be supported across the lifespan of the RH=
EL<br>
&gt; major version number.=C2=A0 OpenAFS is not such a kernel module.=C2=A0=
 As a result<br>
&gt; it is vulnerable to breakage each and every time a new kernel is shipp=
ed.<br>
<br>
</span>Jeffrey,<br>
<br>
the usual way to use DKMS is to either have it build a module for a newly<b=
r>
installed kernel or install a prebuilt module for that kernel. It may be<br=
>
possible to abuse it for providing a module built for another kernel, but<b=
r>
I think that won&#39;t happen accidentally.<br>
<br>
You may be confusing DKMS with RHEL&#39;s &quot;KABI tracking kmods&quot;. =
Those should<br>
be safe to use within a RHEL minor release (and the SL packaging has been<b=
r>
using them like this since EL6.4), but aren&#39;t across minor releases (an=
d<br>
that&#39;s why the SL packaging modifies the kmod handling to require a bui=
ld<br>
for the minor release in question.<br>
<span class=3D""><br>
&gt; There are two types of failures that can occur:<br>
&gt;<br>
&gt; 1. a change results in failure to build the OpenAFS kernel module<br>
&gt;=C2=A0 =C2=A0 for the new kernel<br>
&gt;<br>
&gt; 2. a change results in the OpenAFS kernel module building and<br>
&gt;=C2=A0 =C2=A0 successfully loading but failing to operate correctly<br>
<br>
</span>The latter shouldn&#39;t happen within a minor release, but can acro=
ss<br>
minor releases.<br>
<span class=3D""><br>
&gt; It is the second of these possibilities that has taken place with the<=
br>
&gt; release of the 3.10.0-830.el7 kernel shipped as part of the RHEL 7.5 b=
eta.<br>
&gt;<br>
&gt; Are you an early adopter of RHEL 7.5 beta?=C2=A0 Absolutely, its a bet=
a<br>
&gt; release and as such you should expect that there will be bugs and that=
<br>
&gt; third party kernel modules that do not adhere to the KABI functionalit=
y<br>
&gt; might have compatibility issues.<br>
<br>
</span>The -830 kernel can break 3rd-party modules using non-whitelisted AB=
Is,<br>
whether or not they adhere to the &quot;KABI functionality&quot;.<br>
<span class=3D""><br>
&gt; There was a compatibility issue with RHEL 7.4 kernel<br>
&gt; (3.10.0_693.1.1.el7) as well that was only fixed in the OpenAFS 1.6<br=
>
&gt; release series this past week as part of <a href=3D"http://1.6.22.2" r=
el=3D"noreferrer" target=3D"_blank">1.6.22.2</a>:<br>
&gt;<br>
&gt;=C2=A0 <a href=3D"http://www.openafs.org/dl/openafs/1.6.22.2/RELNOTES-1=
.6.22.2" rel=3D"noreferrer" target=3D"_blank">http://www.openafs.org/dl/<wb=
r>openafs/1.6.22.2/RELNOTES-1.6.<wbr>22.2</a><br>
<br>
</span>Yes, and this one was hard to fix. Thanks are due to Mark Vitale for=
<br>
developing the fix and all those who reviewed and tested it.<br>
<span class=3D""><br>
&gt; Jeffrey Altman<br>
&gt; AuriStor, Inc.<br>
&gt;<br>
&gt; P.S. - Welcome to the community.<br>
<br>
</span>Seconded. In particular, the problem report regarding the EL7.5beta<=
br>
kernel was absolutely appropriate.<br>
<span class=3D"HOEnZb"><font color=3D"#888888"><br>
--<br>
Stephan Wiesand<br>
DESY - DV -<br>
Platanenallee 6<br>
15738 Zeuthen, Germany<br>
<br>
<br>
</font></span></blockquote></div><br></div>

--f40304398e6c6e738b0564a20990--