[OpenAFS] getcwd() error for RHEL 7.4 kernel

Matt Vander Werf mvanderw@nd.edu
Wed, 20 Dec 2017 11:09:12 -0500


--001a113c40dc4c8ced0560c7d0df
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi Mark,

Thanks for the info.

Can you elaborate on what exactly would result if it "unsafely continue its
=E2=80=9Cwalk=E2=80=9D of the d_alias list after dropping the i_lock"? Kern=
el panic/crash?
Segfault? Data corruption?

We've been running the current 1.6.x patch (12796 with 1.6.22) on a
production system (where we've seen the getcwd issue) this week since this
past Sunday morning and haven't encountered any getcwd or other issues
resulting from it so far (at least from what we've been able to see).

Do you (or anyone else) have any kind of ETA as to when the master patches
may be ready to be merged and a proper 1.6.x backport can be created (or
better yet, when a 1.6.x release with that backport will be released)? Are
we talking weeks? Sometime early/mid/late next month? The month after? Any
ideas would be helpful.

Thanks!

--
Matt Vander Werf
HPC System Administrator
University of Notre Dame
Center for Research Computing - Union Station
506 W. South Street
South Bend, IN 46601
Phone: (574) 631-0692

On Tue, Dec 19, 2017 at 11:55 AM, Mark Vitale <mvitale@sinenomine.net>
wrote:

>
> > On Dec 5, 2017, at 11:28 AM, Matt Vander Werf <mvanderw@nd.edu> wrote:
> >
> > I've created RPMs using the source (1.6.21.1) with this patch and have
> installed it on several systems running the latest RHEL 7.4 kernel. I
> haven=E2=80=99t noticed any issues from the fixes (can't say my testing h=
as been
> exhaustive though), but these also aren't very busy systems and I also
> haven't ever seen the getcwd issues on these systems either.
>
> Thank you for doing this testing.  I did not experience any problems with
> the 1.6.x patch (https://gerrit.openafs.org/#/c/12796/1) in my testing
> either.  However, after further work
> on the master patches, I no longer recommend 12796 because it may unsafel=
y
> continue
> its =E2=80=9Cwalk=E2=80=9D of the d_alias list after dropping the i_lock.
>
> Since we are getting closer on the master patches, I don=E2=80=99t plan t=
o produce
> another 1.6.x
> =E2=80=9Cemergency=E2=80=9D patch.  Instead, I=E2=80=99ll wait until the =
master patches are
> merged, then produce a
> proper 1.6.x backport from that.
>
> Regards,
> =E2=80=94
> Mark Vitale
>
>
>

--001a113c40dc4c8ced0560c7d0df
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div><div><div>Hi Mark,<br><br></div>Thanks for =
the info.<br><br></div>Can you elaborate on what exactly would result if it=
 &quot;unsafely continue its =E2=80=9Cwalk=E2=80=9D of the d_alias list aft=
er dropping the i_lock&quot;? Kernel panic/crash? Segfault? Data corruption=
?</div><div><br></div><div>We&#39;ve been running the current 1.6.x patch (=
12796 with 1.6.22) on a production system (where we&#39;ve seen the getcwd =
issue) this week since this past Sunday morning and haven&#39;t encountered=
 any getcwd or other issues resulting from it so far (at least from what we=
&#39;ve been able to see).<br><br></div>Do you (or anyone else) have any ki=
nd of ETA as to when the master patches may be ready to be merged and a pro=
per 1.6.x backport can be created (or better yet, when a 1.6.x release with=
 that backport will be released)? Are we talking weeks? Sometime early/mid/=
late next month? The month after? Any ideas would be helpful.<br></div><br>=
</div>Thanks!<br><div><div><div><div class=3D"gmail_extra"><br clear=3D"all=
"><div><div class=3D"gmail_signature"><div dir=3D"ltr"><div>--<br></div><di=
v>Matt Vander Werf<br>HPC System Administrator<br>University of Notre Dame<=
br>Center for Research Computing - Union Station<br>506 W. South Street<br>=
South Bend, IN 46601<br></div>Phone: (574) 631-0692</div></div></div>
<br><div class=3D"gmail_quote">On Tue, Dec 19, 2017 at 11:55 AM, Mark Vital=
e <span dir=3D"ltr">&lt;<a href=3D"mailto:mvitale@sinenomine.net" target=3D=
"_blank">mvitale@sinenomine.net</a>&gt;</span> wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg=
b(204,204,204);padding-left:1ex"><span class=3D"gmail-"><br>
&gt; On Dec 5, 2017, at 11:28 AM, Matt Vander Werf &lt;<a href=3D"mailto:mv=
anderw@nd.edu">mvanderw@nd.edu</a>&gt; wrote:<br>
&gt;<br>
&gt; I&#39;ve created RPMs using the source (1.6.21.1) with this patch and =
have installed it on several systems running the latest RHEL 7.4 kernel. I =
haven=E2=80=99t noticed any issues from the fixes (can&#39;t say my testing=
 has been exhaustive though), but these also aren&#39;t very busy systems a=
nd I also haven&#39;t ever seen the getcwd issues on these systems either.<=
br>
<br>
</span>Thank you for doing this testing.=C2=A0 I did not experience any pro=
blems with the 1.6.x patch (<a href=3D"https://gerrit.openafs.org/#/c/12796=
/1" rel=3D"noreferrer" target=3D"_blank">https://gerrit.openafs.org/#/<wbr>=
c/12796/1</a>) in my testing either.=C2=A0 However, after further work<br>
on the master patches, I no longer recommend 12796 because it may unsafely =
continue<br>
its =E2=80=9Cwalk=E2=80=9D of the d_alias list after dropping the i_lock.<b=
r>
<br>
Since we are getting closer on the master patches, I don=E2=80=99t plan to =
produce another 1.6.x<br>
=E2=80=9Cemergency=E2=80=9D patch.=C2=A0 Instead, I=E2=80=99ll wait until t=
he master patches are merged, then produce a<br>
proper 1.6.x backport from that.<br>
<br>
Regards,<br>
=E2=80=94<br>
<span class=3D"gmail-HOEnZb"><font color=3D"#888888">Mark Vitale<br>
<br>
<br>
</font></span></blockquote></div><br></div></div></div></div></div>

--001a113c40dc4c8ced0560c7d0df--