[OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up
Fri, 2 Mar 2018 08:47:26 +0000
Is there any progress on this issue? Can we expect a stable release for RHE=
From: firstname.lastname@example.org [mailto:email@example.com=
] On Behalf Of Benjamin Kaduk
Sent: den 9 februari 2018 01:02
To: Kodiak Firesmith <firstname.lastname@example.org>
Cc: openafs-info <email@example.com>
Subject: Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up
On Wed, Feb 07, 2018 at 11:46:28AM -0500, Kodiak Firesmith wrote:
> Hello again All,
> As part of continued testing, I've been able to confirm that the=20
> SystemD double-service startup thing only happens to my hosts when=20
> going from RHEL
> 7.4 to RHEL 7.5beta. On a test host installed directly as RHEL=20
> 7.5beta, I get a bit farther with 22.214.171.124, in that I get to the=20
> point where OpenAFS "kind of" works.
Thanks for tracking this down. The rpm packaging maintainers may want to t=
ry to track down why the double-start happens in the upgrade scenario, as t=
hat's pretty nasty behavior.
> What I'm observing is that the openafs client Kernel module (built by=20
> DKMS) loads fine, and just so long as you know where you need to go in=20
> /afs, you can get there, and you can read and write files and the OpenAFS=
> command works. But doing an 'ls' of /afs or any path underneath=20
> results in
> "ls: reading directory /afs/: Not a directory".
> I ran an strace of a good RHEL 7.4 host running ls on /afs, and a RHEL=20
> 7.5beta host running ls on /afs and have created pastebins of both, as=20
> well as an inline diff.
> All can be seen at the following locations:
> Hopefully this might help the OpenAFS devs, or someone might know what=20
> might be borking on every RHEL 7.5 beta host. It does fit with what=20
> 7.5 beta users have observed OpenAFS doing.
Yes, now it seems like all our reports are consistent, and we just have to =
wait for a developer to get a better look at what Red Hat changed in the ke=
rnel that we need to adapt to.
> - Kodiak
> On Mon, Feb 5, 2018 at 12:31 PM, Stephan Wiesand=20
> > > On 04.Feb 2018, at 02:11, Jeffrey Altman <firstname.lastname@example.org> wrote=
> > >
> > > On 2/2/2018 6:04 PM, Kodiak Firesmith wrote:
> > >> I'm relatively new to handling OpenAFS. Are these problems part=20
> > >> of a normal "kernel release; openafs update" cycle and perhaps=20
> > >> I'm getting snagged just by being too early of an adopter? I=20
> > >> wanted to raise the alarm on this and see if anything else was=20
> > >> needed from me as the reporter of the issue, but perhaps that's=20
> > >> an overreaction to what is just part of a normal process I just=20
> > >> haven't been tuned into in prior RHEL release cycles?
> > >
> > >
> > > Kodiak,
> > >
> > > On RHEL, DKMS is safe to use for kernel modules that restrict=20
> > > themselves to using the restricted set of kernel interfaces (the=20
> > > RHEL KABI) that Red Hat has designated will be supported across=20
> > > the lifespan of the RHEL major version number. OpenAFS is not=20
> > > such a kernel module. As a result it is vulnerable to breakage each =
and every time a new kernel is shipped.
> > Jeffrey,
> > the usual way to use DKMS is to either have it build a module for a=20
> > newly installed kernel or install a prebuilt module for that kernel.=20
> > It may be possible to abuse it for providing a module built for=20
> > another kernel, but I think that won't happen accidentally.
> > You may be confusing DKMS with RHEL's "KABI tracking kmods". Those=20
> > should be safe to use within a RHEL minor release (and the SL=20
> > packaging has been using them like this since EL6.4), but aren't=20
> > across minor releases (and that's why the SL packaging modifies the=20
> > kmod handling to require a build for the minor release in question.
> > > There are two types of failures that can occur:
> > >
> > > 1. a change results in failure to build the OpenAFS kernel module
> > > for the new kernel
> > >
> > > 2. a change results in the OpenAFS kernel module building and
> > > successfully loading but failing to operate correctly
> > The latter shouldn't happen within a minor release, but can across=20
> > minor releases.
> > > It is the second of these possibilities that has taken place with=20
> > > the release of the 3.10.0-830.el7 kernel shipped as part of the=20
> > > RHEL 7.5
> > beta.
> > >
> > > Are you an early adopter of RHEL 7.5 beta? Absolutely, its a beta=20
> > > release and as such you should expect that there will be bugs and=20
> > > that third party kernel modules that do not adhere to the KABI=20
> > > functionality might have compatibility issues.
> > The -830 kernel can break 3rd-party modules using non-whitelisted=20
> > ABIs, whether or not they adhere to the "KABI functionality".
> > > There was a compatibility issue with RHEL 7.4 kernel
> > > (3.10.0_693.1.1.el7) as well that was only fixed in the OpenAFS=20
> > > 1.6 release series this past week as part of 126.96.36.199:
> > >
> > > http://www.openafs.org/dl/openafs/188.8.131.52/RELNOTES-184.108.40.206
> > Yes, and this one was hard to fix. Thanks are due to Mark Vitale for=20
> > developing the fix and all those who reviewed and tested it.
> > > Jeffrey Altman
> > > AuriStor, Inc.
> > >
> > > P.S. - Welcome to the community.
> > Seconded. In particular, the problem report regarding the EL7.5beta=20
> > kernel was absolutely appropriate.
> > --
> > Stephan Wiesand
> > DESY - DV -
> > Platanenallee 6
> > 15738 Zeuthen, Germany
OpenAFS-info mailing list