[OpenAFS] getcwd() error for RHEL 7.4 kernel

Matt Vander Werf mvanderw@nd.edu
Fri, 1 Dec 2017 13:48:36 -0500


--001a114d55f668357f055f4bd30b
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

I noticed you added your patch(es) to gerrit for the RHEL 7.4 getcwd issue
(Thanks!).

Responding to your comment on the latest commit, "I can submit an
equivalent, but simpler, "emergency" 1.6.x backport of just this top commit
on request.": This definitely would be preferred from our end! (Would allow
us to test just the getcwd patch in the 1.6.x branch, which is what we
use.) Once this is available, I can test this in our setup to confirm it
fixes the getcwd issue for us as well.

Thanks!

--
Matt Vander Werf
HPC System Administrator
University of Notre Dame
Center for Research Computing - Union Station
506 W. South Street
South Bend, IN 46601
Phone: (574) 631-0692

On Sun, Nov 19, 2017 at 3:41 PM, Mark Vitale <mvitale@sinenomine.net> wrote=
:

>
> > On Nov 16, 2017, at 12:26 PM, Stephan Wiesand <stephan.wiesand@desy.de>
> wrote:
> >
> >
> > On Nov 16, 2017, at 07:06 , Benjamin Kaduk wrote:
> >
> >> On Wed, Nov 15, 2017 at 01:02:15PM -0500, Matt Vander Werf wrote:
> >>> Hello,
> >>>
> >>> Are there any updates or progress on a potential fix for this issue?
> >>> Anything we can do to help figure things out?
> >>
> >> This topic was on the agenda for our release-team meeting yesterday.
> >
> > Well, it has been for the last couple of weeks.
> >
> >> If I remmber correctly, multiple developers have gotten fairly
> >> reliable ways to reproduce the issue locally.
> >> It also seems that as a workaround, reverting
> >> https://gerrit.openafs.org/#/c/12451/ is likely to reduce the
> >> likelihood of triggering events.
> >
> > Yes, but there=E2=80=99s at least one known client configuration (small=
 stat
> cache, -disable-dynamic-vcaches) for which reverting that change actually
> makes things worse.
>
> The root cause is that the semantics of Linux d_invalidate() changed
> between
> 3.10.0-514 (RH/CentOS 7.3) and 3.10.0-693 (RH/CentOS 7.4).
> The former would return -EBUSY if you attempted to invalidate the
> current working directory.  The latter will invalidate (unhash)
> the current working directory=E2=80=99s dentry without a second thought.
> OpenAFS code in afs_ShakeLooseVCaches() currently relies on the former
> behavior
> to prevent the getcwd() ENOENT problem.
>
> I am working on a patch and will submit it to gerrit when it passes my
> tests.
>
> Thank you to everyone who shared debugging and test results.
> I will post here again when the patch is available in gerrit, so that
> anyone
> who wishes may test it in their setup.
>
> Regards,
> =E2=80=94
> Mark Vitale
> Sine Nomine Associates
>
>

--001a114d55f668357f055f4bd30b
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div>I noticed you added your patch(es) to gerrit for=
 the RHEL 7.4 getcwd issue (Thanks!).</div><div><br></div><div>Responding t=
o your comment on the latest commit, &quot;I can submit an equivalent, but =
simpler, &quot;emergency&quot; 1.6.x backport of just this top commit on re=
quest.&quot;: This definitely would be preferred from our end! (Would allow=
 us to test just the getcwd patch in the 1.6.x branch, which is what we use=
.) Once this is available, I can test this in our setup to confirm it fixes=
 the getcwd issue for us as well.<br></div><br></div>Thanks!<br><div class=
=3D"gmail_extra"><br clear=3D"all"><div><div class=3D"gmail-m_1818195326290=
072356gmail_signature"><div dir=3D"ltr"><div>--<br></div><div>Matt Vander W=
erf<br>HPC System Administrator<br>University of Notre Dame<br>Center for R=
esearch Computing - Union Station<br>506 W. South Street<br>South Bend, IN =
46601<br></div>Phone: <a href=3D"tel:(574)%20631-0692" value=3D"+1574631069=
2" target=3D"_blank">(574) 631-0692</a></div></div></div>
<br><div class=3D"gmail_quote">On Sun, Nov 19, 2017 at 3:41 PM, Mark Vitale=
 <span dir=3D"ltr">&lt;<a href=3D"mailto:mvitale@sinenomine.net" target=3D"=
_blank">mvitale@sinenomine.net</a>&gt;</span> wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg=
b(204,204,204);padding-left:1ex"><span><br>
&gt; On Nov 16, 2017, at 12:26 PM, Stephan Wiesand &lt;<a href=3D"mailto:st=
ephan.wiesand@desy.de" target=3D"_blank">stephan.wiesand@desy.de</a>&gt; wr=
ote:<br>
&gt;<br>
&gt;<br>
&gt; On Nov 16, 2017, at 07:06 , Benjamin Kaduk wrote:<br>
&gt;<br>
&gt;&gt; On Wed, Nov 15, 2017 at 01:02:15PM -0500, Matt Vander Werf wrote:<=
br>
&gt;&gt;&gt; Hello,<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Are there any updates or progress on a potential fix for this =
issue?<br>
&gt;&gt;&gt; Anything we can do to help figure things out?<br>
&gt;&gt;<br>
&gt;&gt; This topic was on the agenda for our release-team meeting yesterda=
y.<br>
&gt;<br>
&gt; Well, it has been for the last couple of weeks.<br>
&gt;<br>
&gt;&gt; If I remmber correctly, multiple developers have gotten fairly<br>
&gt;&gt; reliable ways to reproduce the issue locally.<br>
&gt;&gt; It also seems that as a workaround, reverting<br>
&gt;&gt; <a href=3D"https://gerrit.openafs.org/#/c/12451/" rel=3D"noreferre=
r" target=3D"_blank">https://gerrit.openafs.org/#/c<wbr>/12451/</a> is like=
ly to reduce the<br>
&gt;&gt; likelihood of triggering events.<br>
&gt;<br>
&gt; Yes, but there=E2=80=99s at least one known client configuration (smal=
l stat cache, -disable-dynamic-vcaches) for which reverting that change act=
ually makes things worse.<br>
<br>
</span>The root cause is that the semantics of Linux d_invalidate() changed=
 between<br>
3.10.0-514 (RH/CentOS 7.3) and 3.10.0-693 (RH/CentOS 7.4).<br>
The former would return -EBUSY if you attempted to invalidate the<br>
current working directory.=C2=A0 The latter will invalidate (unhash)<br>
the current working directory=E2=80=99s dentry without a second thought.<br=
>
OpenAFS code in afs_ShakeLooseVCaches() currently relies on the former beha=
vior<br>
to prevent the getcwd() ENOENT problem.<br>
<br>
I am working on a patch and will submit it to gerrit when it passes my test=
s.<br>
<br>
Thank you to everyone who shared debugging and test results.<br>
I will post here again when the patch is available in gerrit, so that anyon=
e<br>
who wishes may test it in their setup.<br>
<br>
Regards,<br>
=E2=80=94<br>
<span class=3D"gmail-m_1818195326290072356HOEnZb"><font color=3D"#888888">M=
ark Vitale<br>
Sine Nomine Associates<br>
<br>
</font></span></blockquote></div><br></div></div>

--001a114d55f668357f055f4bd30b--