[OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

Kodiak Firesmith kfiresmith@gmail.com
Fri, 23 Mar 2018 07:27:15 -0400


--089e082243ccd41082056812b4a1
Content-Type: text/plain; charset="UTF-8"

I've also tested gsgatlin's 7.5beta RPMs and they work great.  Any chance
we'll see the rh75enotdir patch integrated into a release of 1.6.22.3
soon?  I'm wondering if it'll be worth it to manually apply that patch to a
rebuild of the official OpenAFS RPMs if this isn't on the block for being
merged and released soon - but I don't want to blow the time applying that
patch to a re-roll if a fixed official release is forthcoming.

Thanks!
 - Kodiak


On Fri, Mar 2, 2018 at 3:47 AM, Anders Nordin <anders.j.nordin@ltu.se>
wrote:

> Hello,
>
> Is there any progress on this issue? Can we expect a stable release for
> RHEL 7.5?
>
> MVH
> Anders
>
> -----Original Message-----
> From: openafs-info-admin@openafs.org [mailto:openafs-info-admin@ope
> nafs.org] On Behalf Of Benjamin Kaduk
> Sent: den 9 februari 2018 01:02
> To: Kodiak Firesmith <kfiresmith@gmail.com>
> Cc: openafs-info <openafs-info@openafs.org>
> Subject: Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up
>
> On Wed, Feb 07, 2018 at 11:46:28AM -0500, Kodiak Firesmith wrote:
> > Hello again All,
> >
> > As part of continued testing, I've been able to confirm that the
> > SystemD double-service startup thing only happens to my hosts when
> > going from RHEL
> > 7.4 to RHEL 7.5beta.  On a test host installed directly as RHEL
> > 7.5beta, I get a bit farther with 1.6.18.22, in that I get to the
> > point where OpenAFS "kind of" works.
>
> Thanks for tracking this down.  The rpm packaging maintainers may want to
> try to track down why the double-start happens in the upgrade scenario, as
> that's pretty nasty behavior.
>
> > What I'm observing is that the openafs client Kernel module (built by
> > DKMS) loads fine, and just so long as you know where you need to go in
> > /afs, you can get there, and you can read and write files and the
> OpenAFS 'fs'
> > command works.  But doing an 'ls' of /afs or any path underneath
> > results in
> > "ls: reading directory /afs/: Not a directory".
> >
> > I ran an strace of a good RHEL 7.4 host running ls on /afs, and a RHEL
> > 7.5beta host running ls on /afs and have created pastebins of both, as
> > well as an inline diff.
> >
> > All can be seen at the following locations:
> >
> > works
> > https://paste.fedoraproject.org/paste/Hiojt2~Be3wgez47bKNucQ
> >
> > fails
> > https://paste.fedoraproject.org/paste/13ZXBfJIOMsuEJFwFShBfg
> >
> >
> > diff
> > https://paste.fedoraproject.org/paste/FJKRwep1fWJogIDbLnkn8A
> >
> > Hopefully this might help the OpenAFS devs, or someone might know what
> > might be borking on every RHEL 7.5 beta host.  It does fit with what
> > other
> > 7.5 beta users have observed OpenAFS doing.
>
> Yes, now it seems like all our reports are consistent, and we just have to
> wait for a developer to get a better look at what Red Hat changed in the
> kernel that we need to adapt to.
>
> -Ben
>
> > Thanks!
> >  - Kodiak
> >
> > On Mon, Feb 5, 2018 at 12:31 PM, Stephan Wiesand
> > <stephan.wiesand@desy.de>
> > wrote:
> >
> > >
> > > > On 04.Feb 2018, at 02:11, Jeffrey Altman <jaltman@auristor.com>
> wrote:
> > > >
> > > > On 2/2/2018 6:04 PM, Kodiak Firesmith wrote:
> > > >> I'm relatively new to handling OpenAFS.  Are these problems part
> > > >> of a normal "kernel release; openafs update" cycle and perhaps
> > > >> I'm getting snagged just by being too early of an adopter?  I
> > > >> wanted to raise the alarm on this and see if anything else was
> > > >> needed from me as the reporter of the issue, but perhaps that's
> > > >> an overreaction to what is just part of a normal process I just
> > > >> haven't been tuned into in prior RHEL release cycles?
> > > >
> > > >
> > > > Kodiak,
> > > >
> > > > On RHEL, DKMS is safe to use for kernel modules that restrict
> > > > themselves to using the restricted set of kernel interfaces (the
> > > > RHEL KABI) that Red Hat has designated will be supported across
> > > > the lifespan of the RHEL major version number.  OpenAFS is not
> > > > such a kernel module.  As a result it is vulnerable to breakage each
> and every time a new kernel is shipped.
> > >
> > > Jeffrey,
> > >
> > > the usual way to use DKMS is to either have it build a module for a
> > > newly installed kernel or install a prebuilt module for that kernel.
> > > It may be possible to abuse it for providing a module built for
> > > another kernel, but I think that won't happen accidentally.
> > >
> > > You may be confusing DKMS with RHEL's "KABI tracking kmods". Those
> > > should be safe to use within a RHEL minor release (and the SL
> > > packaging has been using them like this since EL6.4), but aren't
> > > across minor releases (and that's why the SL packaging modifies the
> > > kmod handling to require a build for the minor release in question.
> > >
> > > > There are two types of failures that can occur:
> > > >
> > > > 1. a change results in failure to build the OpenAFS kernel module
> > > >    for the new kernel
> > > >
> > > > 2. a change results in the OpenAFS kernel module building and
> > > >    successfully loading but failing to operate correctly
> > >
> > > The latter shouldn't happen within a minor release, but can across
> > > minor releases.
> > >
> > > > It is the second of these possibilities that has taken place with
> > > > the release of the 3.10.0-830.el7 kernel shipped as part of the
> > > > RHEL 7.5
> > > beta.
> > > >
> > > > Are you an early adopter of RHEL 7.5 beta?  Absolutely, its a beta
> > > > release and as such you should expect that there will be bugs and
> > > > that third party kernel modules that do not adhere to the KABI
> > > > functionality might have compatibility issues.
> > >
> > > The -830 kernel can break 3rd-party modules using non-whitelisted
> > > ABIs, whether or not they adhere to the "KABI functionality".
> > >
> > > > There was a compatibility issue with RHEL 7.4 kernel
> > > > (3.10.0_693.1.1.el7) as well that was only fixed in the OpenAFS
> > > > 1.6 release series this past week as part of 1.6.22.2:
> > > >
> > > >  http://www.openafs.org/dl/openafs/1.6.22.2/RELNOTES-1.6.22.2
> > >
> > > Yes, and this one was hard to fix. Thanks are due to Mark Vitale for
> > > developing the fix and all those who reviewed and tested it.
> > >
> > > > Jeffrey Altman
> > > > AuriStor, Inc.
> > > >
> > > > P.S. - Welcome to the community.
> > >
> > > Seconded. In particular, the problem report regarding the EL7.5beta
> > > kernel was absolutely appropriate.
> > >
> > > --
> > > Stephan Wiesand
> > > DESY - DV -
> > > Platanenallee 6
> > > 15738 Zeuthen, Germany
> > >
> > >
> > >
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>

--089e082243ccd41082056812b4a1
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra">I&#39;ve also tested gsgatlin&#=
39;s 7.5beta RPMs and they work great.=C2=A0 Any chance we&#39;ll see the=
=C2=A0<span style=3D"color:rgb(51,51,51);font-family:Consolas,Menlo,Monaco,=
&quot;Lucida Console&quot;,&quot;Liberation Mono&quot;,&quot;DejaVu Sans Mo=
no&quot;,&quot;Bitstream Vera Sans Mono&quot;,monospace,serif;font-size:12p=
x;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;=
font-weight:400;letter-spacing:normal;text-align:left;text-indent:0px;text-=
transform:none;white-space:normal;word-spacing:0px;background-color:rgb(248=
,248,248);text-decoration-style:initial;text-decoration-color:initial;float=
:none;display:inline">rh75enotdir patch integrated into a release of 1.6.22=
.3 soon?=C2=A0 I&#39;m wondering if it&#39;ll be worth it to manually apply=
 that patch to a rebuild of the official OpenAFS RPMs if this isn&#39;t on =
the block for being merged and released soon - but I don&#39;t want to blow=
 the time applying that patch to a re-roll if a fixed official release is f=
orthcoming.</span></div><div class=3D"gmail_extra"><span style=3D"color:rgb=
(51,51,51);font-family:Consolas,Menlo,Monaco,&quot;Lucida Console&quot;,&qu=
ot;Liberation Mono&quot;,&quot;DejaVu Sans Mono&quot;,&quot;Bitstream Vera =
Sans Mono&quot;,monospace,serif;font-size:12px;font-style:normal;font-varia=
nt-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing=
:normal;text-align:left;text-indent:0px;text-transform:none;white-space:nor=
mal;word-spacing:0px;background-color:rgb(248,248,248);text-decoration-styl=
e:initial;text-decoration-color:initial;float:none;display:inline"><br></sp=
an></div><div class=3D"gmail_extra"><span style=3D"color:rgb(51,51,51);font=
-family:Consolas,Menlo,Monaco,&quot;Lucida Console&quot;,&quot;Liberation M=
ono&quot;,&quot;DejaVu Sans Mono&quot;,&quot;Bitstream Vera Sans Mono&quot;=
,monospace,serif;font-size:12px;font-style:normal;font-variant-ligatures:no=
rmal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-al=
ign:left;text-indent:0px;text-transform:none;white-space:normal;word-spacin=
g:0px;background-color:rgb(248,248,248);text-decoration-style:initial;text-=
decoration-color:initial;float:none;display:inline">Thanks!</span></div><di=
v class=3D"gmail_extra"><span style=3D"color:rgb(51,51,51);font-family:Cons=
olas,Menlo,Monaco,&quot;Lucida Console&quot;,&quot;Liberation Mono&quot;,&q=
uot;DejaVu Sans Mono&quot;,&quot;Bitstream Vera Sans Mono&quot;,monospace,s=
erif;font-size:12px;font-style:normal;font-variant-ligatures:normal;font-va=
riant-caps:normal;font-weight:400;letter-spacing:normal;text-align:left;tex=
t-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;backgr=
ound-color:rgb(248,248,248);text-decoration-style:initial;text-decoration-c=
olor:initial;float:none;display:inline">=C2=A0- Kodiak</span></div><div cla=
ss=3D"gmail_extra"><br></div><div class=3D"gmail_extra"><br><div class=3D"g=
mail_quote">On Fri, Mar 2, 2018 at 3:47 AM, Anders Nordin <span dir=3D"ltr"=
>&lt;<a href=3D"mailto:anders.j.nordin@ltu.se" target=3D"_blank">anders.j.n=
ordin@ltu.se</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" sty=
le=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello,=
<br>
<br>
Is there any progress on this issue? Can we expect a stable release for RHE=
L 7.5?<br>
<br>
MVH<br>
Anders<br>
<div><div class=3D"m_-4336914190067955941h5"><br>
-----Original Message-----<br>
From: <a href=3D"mailto:openafs-info-admin@openafs.org" target=3D"_blank">o=
penafs-info-admin@openafs.org</a> [mailto:<a href=3D"mailto:openafs-info-ad=
min@openafs.org" target=3D"_blank">openafs-info-admin@ope<wbr>nafs.org</a>]=
 On Behalf Of Benjamin Kaduk<br>
Sent: den 9 februari 2018 01:02<br>
To: Kodiak Firesmith &lt;<a href=3D"mailto:kfiresmith@gmail.com" target=3D"=
_blank">kfiresmith@gmail.com</a>&gt;<br>
Cc: openafs-info &lt;<a href=3D"mailto:openafs-info@openafs.org" target=3D"=
_blank">openafs-info@openafs.org</a>&gt;<br>
Subject: Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up=
<br>
<br>
On Wed, Feb 07, 2018 at 11:46:28AM -0500, Kodiak Firesmith wrote:<br>
&gt; Hello again All,<br>
&gt;<br>
&gt; As part of continued testing, I&#39;ve been able to confirm that the<b=
r>
&gt; SystemD double-service startup thing only happens to my hosts when<br>
&gt; going from RHEL<br>
&gt; 7.4 to RHEL 7.5beta.=C2=A0 On a test host installed directly as RHEL<b=
r>
&gt; 7.5beta, I get a bit farther with 1.6.18.22, in that I get to the<br>
&gt; point where OpenAFS &quot;kind of&quot; works.<br>
<br>
Thanks for tracking this down.=C2=A0 The rpm packaging maintainers may want=
 to try to track down why the double-start happens in the upgrade scenario,=
 as that&#39;s pretty nasty behavior.<br>
<br>
&gt; What I&#39;m observing is that the openafs client Kernel module (built=
 by<br>
&gt; DKMS) loads fine, and just so long as you know where you need to go in=
<br>
&gt; /afs, you can get there, and you can read and write files and the Open=
AFS &#39;fs&#39;<br>
&gt; command works.=C2=A0 But doing an &#39;ls&#39; of /afs or any path und=
erneath<br>
&gt; results in<br>
&gt; &quot;ls: reading directory /afs/: Not a directory&quot;.<br>
&gt;<br>
&gt; I ran an strace of a good RHEL 7.4 host running ls on /afs, and a RHEL=
<br>
&gt; 7.5beta host running ls on /afs and have created pastebins of both, as=
<br>
&gt; well as an inline diff.<br>
&gt;<br>
&gt; All can be seen at the following locations:<br>
&gt;<br>
&gt; works<br>
&gt; <a href=3D"https://paste.fedoraproject.org/paste/Hiojt2~Be3wgez47bKNuc=
Q" rel=3D"noreferrer" target=3D"_blank">https://paste.fedoraproject.or<wbr>=
g/paste/Hiojt2~Be3wgez47bKNucQ</a><br>
&gt;<br>
&gt; fails<br>
&gt; <a href=3D"https://paste.fedoraproject.org/paste/13ZXBfJIOMsuEJFwFShBf=
g" rel=3D"noreferrer" target=3D"_blank">https://paste.fedoraproject.or<wbr>=
g/paste/13ZXBfJIOMsuEJFwFShBfg</a><br>
&gt;<br>
&gt;<br>
&gt; diff<br>
&gt; <a href=3D"https://paste.fedoraproject.org/paste/FJKRwep1fWJogIDbLnkn8=
A" rel=3D"noreferrer" target=3D"_blank">https://paste.fedoraproject.or<wbr>=
g/paste/FJKRwep1fWJogIDbLnkn8A</a><br>
&gt;<br>
&gt; Hopefully this might help the OpenAFS devs, or someone might know what=
<br>
&gt; might be borking on every RHEL 7.5 beta host.=C2=A0 It does fit with w=
hat<br>
&gt; other<br>
&gt; 7.5 beta users have observed OpenAFS doing.<br>
<br>
Yes, now it seems like all our reports are consistent, and we just have to =
wait for a developer to get a better look at what Red Hat changed in the ke=
rnel that we need to adapt to.<br>
<br>
-Ben<br>
<br>
&gt; Thanks!<br>
&gt;=C2=A0 - Kodiak<br>
&gt;<br>
&gt; On Mon, Feb 5, 2018 at 12:31 PM, Stephan Wiesand<br>
&gt; &lt;<a href=3D"mailto:stephan.wiesand@desy.de" target=3D"_blank">steph=
an.wiesand@desy.de</a>&gt;<br>
&gt; wrote:<br>
&gt;<br>
&gt; &gt;<br>
&gt; &gt; &gt; On 04.Feb 2018, at 02:11, Jeffrey Altman &lt;<a href=3D"mail=
to:jaltman@auristor.com" target=3D"_blank">jaltman@auristor.com</a>&gt; wro=
te:<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; On 2/2/2018 6:04 PM, Kodiak Firesmith wrote:<br>
&gt; &gt; &gt;&gt; I&#39;m relatively new to handling OpenAFS.=C2=A0 Are th=
ese problems part<br>
&gt; &gt; &gt;&gt; of a normal &quot;kernel release; openafs update&quot; c=
ycle and perhaps<br>
&gt; &gt; &gt;&gt; I&#39;m getting snagged just by being too early of an ad=
opter?=C2=A0 I<br>
&gt; &gt; &gt;&gt; wanted to raise the alarm on this and see if anything el=
se was<br>
&gt; &gt; &gt;&gt; needed from me as the reporter of the issue, but perhaps=
 that&#39;s<br>
&gt; &gt; &gt;&gt; an overreaction to what is just part of a normal process=
 I just<br>
&gt; &gt; &gt;&gt; haven&#39;t been tuned into in prior RHEL release cycles=
?<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Kodiak,<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; On RHEL, DKMS is safe to use for kernel modules that restric=
t<br>
&gt; &gt; &gt; themselves to using the restricted set of kernel interfaces =
(the<br>
&gt; &gt; &gt; RHEL KABI) that Red Hat has designated will be supported acr=
oss<br>
&gt; &gt; &gt; the lifespan of the RHEL major version number.=C2=A0 OpenAFS=
 is not<br>
&gt; &gt; &gt; such a kernel module.=C2=A0 As a result it is vulnerable to =
breakage each and every time a new kernel is shipped.<br>
&gt; &gt;<br>
&gt; &gt; Jeffrey,<br>
&gt; &gt;<br>
&gt; &gt; the usual way to use DKMS is to either have it build a module for=
 a<br>
&gt; &gt; newly installed kernel or install a prebuilt module for that kern=
el.<br>
&gt; &gt; It may be possible to abuse it for providing a module built for<b=
r>
&gt; &gt; another kernel, but I think that won&#39;t happen accidentally.<b=
r>
&gt; &gt;<br>
&gt; &gt; You may be confusing DKMS with RHEL&#39;s &quot;KABI tracking kmo=
ds&quot;. Those<br>
&gt; &gt; should be safe to use within a RHEL minor release (and the SL<br>
&gt; &gt; packaging has been using them like this since EL6.4), but aren&#3=
9;t<br>
&gt; &gt; across minor releases (and that&#39;s why the SL packaging modifi=
es the<br>
&gt; &gt; kmod handling to require a build for the minor release in questio=
n.<br>
&gt; &gt;<br>
&gt; &gt; &gt; There are two types of failures that can occur:<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; 1. a change results in failure to build the OpenAFS kernel m=
odule<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 for the new kernel<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; 2. a change results in the OpenAFS kernel module building an=
d<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 successfully loading but failing to operate cor=
rectly<br>
&gt; &gt;<br>
&gt; &gt; The latter shouldn&#39;t happen within a minor release, but can a=
cross<br>
&gt; &gt; minor releases.<br>
&gt; &gt;<br>
&gt; &gt; &gt; It is the second of these possibilities that has taken place=
 with<br>
&gt; &gt; &gt; the release of the 3.10.0-830.el7 kernel shipped as part of =
the<br>
&gt; &gt; &gt; RHEL 7.5<br>
&gt; &gt; beta.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Are you an early adopter of RHEL 7.5 beta?=C2=A0 Absolutely,=
 its a beta<br>
&gt; &gt; &gt; release and as such you should expect that there will be bug=
s and<br>
&gt; &gt; &gt; that third party kernel modules that do not adhere to the KA=
BI<br>
&gt; &gt; &gt; functionality might have compatibility issues.<br>
&gt; &gt;<br>
&gt; &gt; The -830 kernel can break 3rd-party modules using non-whitelisted=
<br>
&gt; &gt; ABIs, whether or not they adhere to the &quot;KABI functionality&=
quot;.<br>
&gt; &gt;<br>
&gt; &gt; &gt; There was a compatibility issue with RHEL 7.4 kernel<br>
&gt; &gt; &gt; (3.10.0_693.1.1.el7) as well that was only fixed in the Open=
AFS<br>
&gt; &gt; &gt; 1.6 release series this past week as part of <a href=3D"http=
://1.6.22.2" rel=3D"noreferrer" target=3D"_blank">1.6.22.2</a>:<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;=C2=A0 <a href=3D"http://www.openafs.org/dl/openafs/1.6.22.2/=
RELNOTES-1.6.22.2" rel=3D"noreferrer" target=3D"_blank">http://www.openafs.=
org/dl/open<wbr>afs/1.6.22.2/RELNOTES-1.6.22.2</a><br>
&gt; &gt;<br>
&gt; &gt; Yes, and this one was hard to fix. Thanks are due to Mark Vitale =
for<br>
&gt; &gt; developing the fix and all those who reviewed and tested it.<br>
&gt; &gt;<br>
&gt; &gt; &gt; Jeffrey Altman<br>
&gt; &gt; &gt; AuriStor, Inc.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; P.S. - Welcome to the community.<br>
&gt; &gt;<br>
&gt; &gt; Seconded. In particular, the problem report regarding the EL7.5be=
ta<br>
&gt; &gt; kernel was absolutely appropriate.<br>
&gt; &gt;<br>
&gt; &gt; --<br>
&gt; &gt; Stephan Wiesand<br>
&gt; &gt; DESY - DV -<br>
&gt; &gt; Platanenallee 6<br>
&gt; &gt; 15738 Zeuthen, Germany<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;<br>
</div></div>______________________________<wbr>_________________<br>
OpenAFS-info mailing list<br>
<a href=3D"mailto:OpenAFS-info@openafs.org" target=3D"_blank">OpenAFS-info@=
openafs.org</a><br>
<a href=3D"https://lists.openafs.org/mailman/listinfo/openafs-info" rel=3D"=
noreferrer" target=3D"_blank">https://lists.openafs.org/mail<wbr>man/listin=
fo/openafs-info</a><br>
</blockquote></div><br></div></div>

--089e082243ccd41082056812b4a1--