[OpenAFS] getcwd() error for RHEL 7.4 kernel

Jacob Bonek jbonek@nd.edu
Tue, 17 Oct 2017 11:55:27 -0400


--001a113e9bc8ddae49055bc0268a
Content-Type: text/plain; charset="UTF-8"

Hello,

We're having some strange issues with OpenAFS lately.

It started after installing the base RHEL 7.4 kernel, 3.10.0-693.el7.x86_64
back in August, with the latest version of OpenAFS client at the time,
1.6.21. We've tried using the now latest version, 1.6.21.1, and still have
the same issues. This happens with all the subsequent RHEL 7.4 kernels as
well, including the latest kernel, 3.10.0-693.2.2.el7.x86_64.

When a user logs in they sometimes get a message similar to this:

shell-init: error retrieving current directory: getcwd: cannot access
parent directories: No such file or directory
tcsh: No such file or directory
tcsh: Trying to start from "<user AFS home directory>"


This doesn't happen for every user and seems to be a transient issue.
We've had issues replicating it reliably internally. The users are able to
access their files just fine afterwards though.

Then, for what seems like random applications, they get an error message
like '<application name>: getcwd() failed'. For example, this has happened
often with the qsub command that is used to submit jobs to our batch
system. So, an example message would be:

qsub: getcwd() failed


We've also seen it with other applications, including git.

This is a major issue that has caused us to have to stay at the latest
pre-RHEL 7.4 kernel for a long time now while this issue has existed. This
may be related to previous issues with getcwd() but something in the RHEL
7.4 kernel seems to have made it much worse. Simply rebooting a system does
not fix it, nor does clearing the AFS cache.

Has anyone else experienced this issue with RHEL 7.4? Is there anything
that we can do to narrow down what is causing this?

Thank you in advance for any assistance!

--001a113e9bc8ddae49055bc0268a
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><font face=3D"arial, helvetica, sans-serif">Hello,</f=
ont></div><div><font face=3D"arial, helvetica, sans-serif"><br></font></div=
><font face=3D"arial, helvetica, sans-serif">We&#39;re having some strange =
issues with OpenAFS lately.</font><div><font face=3D"arial, helvetica, sans=
-serif"><br></font></div><div><font face=3D"arial, helvetica, sans-serif">I=
t started after installing the base RHEL 7.4 kernel, 3.10.0-693.el7.x86_64 =
back in August, with the latest version of OpenAFS client at the time, 1.6.=
21. We&#39;ve tried using the now latest version, 1.6.21.1, and still have =
the same issues. This happens with all the subsequent RHEL 7.4 kernels as w=
ell, including the latest kernel,=C2=A0<span style=3D"font-size:13px;color:=
rgb(38,50,56)">3.10.0-693.2.2.el7.x86_64.</span></font></div><div><div><fon=
t face=3D"arial, helvetica, sans-serif"><br></font></div><div><font face=3D=
"arial, helvetica, sans-serif">When a user logs in they sometimes get a mes=
sage similar to this:</font></div><div><font face=3D"arial, helvetica, sans=
-serif"><br></font></div></div><blockquote style=3D"margin:0px 0px 0px 40px=
;border:none;padding:0px"><div><div><div><font face=3D"arial, helvetica, sa=
ns-serif">shell-init: error retrieving current directory: getcwd: cannot ac=
cess parent directories: No such file or directory</font></div></div></div>=
<div><div><div><font face=3D"arial, helvetica, sans-serif">tcsh: No such fi=
le or directory</font></div></div></div><div><div><div><font face=3D"arial,=
 helvetica, sans-serif">tcsh: Trying to start from &quot;&lt;user AFS home =
directory&gt;&quot;</font></div></div></div></blockquote><div><div><font fa=
ce=3D"arial, helvetica, sans-serif"><br></font></div><div><font face=3D"ari=
al, helvetica, sans-serif">This doesn&#39;t happen for every user and seems=
 to be a transient issue.=C2=A0 We&#39;ve had issues replicating it reliabl=
y internally. The users are able to access their files just fine afterwards=
 though.</font></div><div><font face=3D"arial, helvetica, sans-serif"><br><=
/font></div><div><font face=3D"arial, helvetica, sans-serif">Then, for what=
 seems like random applications, they get an error message like &#39;&lt;ap=
plication name&gt;: getcwd() failed&#39;. For example, this has happened of=
ten with the qsub command that is used to submit jobs to our batch system. =
So, an example message would be:</font></div><div><font face=3D"arial, helv=
etica, sans-serif"><br></font></div></div><blockquote style=3D"margin:0px 0=
px 0px 40px;border:none;padding:0px"><div><div><font face=3D"arial, helveti=
ca, sans-serif">qsub: getcwd() failed</font></div></div></blockquote><div><=
div><font face=3D"arial, helvetica, sans-serif"><br></font></div><div><font=
 face=3D"arial, helvetica, sans-serif">We&#39;ve also seen it with other ap=
plications, including git.</font></div><div><font face=3D"arial, helvetica,=
 sans-serif"><br></font></div><div><div><span style=3D"color:rgb(38,50,56);=
font-size:13px"><font face=3D"arial, helvetica, sans-serif">This is a major=
 issue that has caused us to have to stay at the latest pre-RHEL 7.4 kernel=
 for a long time now while this issue has existed. This may be related to p=
revious issues with getcwd() but something in the RHEL 7.4 kernel seems to =
have made it much worse. Simply rebooting a system does not fix it, nor doe=
s clearing the AFS cache.</font></span></div></div><div><font face=3D"arial=
, helvetica, sans-serif"><br></font></div><div><span style=3D"color:rgb(38,=
50,56);font-size:13px"><font face=3D"arial, helvetica, sans-serif">Has anyo=
ne else experienced this issue with RHEL 7.4? Is there anything that we can=
 do to narrow down what is causing this?</font></span></div><div><font face=
=3D"arial, helvetica, sans-serif"><br></font></div></div><div><span style=
=3D"color:rgb(38,50,56);font-size:13px"><font face=3D"arial, helvetica, san=
s-serif">Thank you in advance for any assistance!</font></span></div></div>

--001a113e9bc8ddae49055bc0268a--