[OpenAFS] OpenAFS 1.8.0 on Linux, getcwd failure

William D. Hamblen William.D.Hamblen@dartmouth.edu
Wed, 18 Jul 2018 19:48:02 +0000


--_000_BN6PR03MB2835FDADC598E79031B59FCBB6530BN6PR03MB2835namp_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hi,


Richard Brittain (co-worker of mine) created a bug report (#134572) about t=
his issue a few weeks ago.   Until that sees some action, I'm wondering if =
anyone has ideas for a workaround.


Essentially the problem is this.  With the new 1.8 Linux client on Redhat 7=
.5, a user logs into a system with AFS homes and gets an error as part of t=
he login process.  For bash the error is:


shell-init: error retrieving current directory: getcwd: cannot access paren=
t directories: No such file or directory


For tcsh I think it is:


tcsh: No such file or directory
tcsh: Trying to start from /afs/path/to/home


Regardless, the user has a valid token and has normal access to their home =
directory but some applications generate the same error.  Presumably it's a=
pps that use getcwd since I can replicate it with a trivial C program using=
 the getcwd system call.


What's weird is that it is client specific and only affects a few specific =
directories and only on that client.  In other words, user1 logging into ma=
chineA will always get the error.  user2 logging into machineA never does. =
 user1 logging into machineB never does either.


A reboot fixed it but it is very difficult to schedule a reboot (or any int=
erruption of AFS) on these systems.  I have tried fs flush, fs flushvolume,=
 and fs flushmount using an affected home directory as the path argument an=
d there is no change.  I am relying on user feedback for that but it's from=
 a user I trust.


At the next scheduled reboot we could go back to the 1.6 series where this =
issue was first reported (Nov/Dec 2017), and then fixed, but... well that k=
ind of stinks.  :-)


Any ideas to fix it on a running system?  Anything useful I could do to fle=
sh out the bug report?  Is the issue already well enough understood that we=
 are just waiting for the next point release in the 1.8 series?


 - Bill


--
William D. Hamblen
Research Systems Engineer
Research Computing, Dartmouth College


--_000_BN6PR03MB2835FDADC598E79031B59FCBB6530BN6PR03MB2835namp_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-=
1">
<style type=3D"text/css" style=3D"display:none;"><!-- P {margin-top:0;margi=
n-bottom:0;} --></style>
</head>
<body dir=3D"ltr">
<div id=3D"divtagdefaultwrapper" style=3D"font-size: 12pt; color: rgb(0, 0,=
 0); font-family: Helvetica, EmojiFont, &quot;Apple Color Emoji&quot;, &quo=
t;Segoe UI Emoji&quot;, NotoColorEmoji, &quot;Segoe UI Symbol&quot;, &quot;=
Android Emoji&quot;, EmojiSymbols;" dir=3D"ltr">
<p style=3D"margin-top:0;margin-bottom:0">Hi,</p>
<p style=3D"margin-top:0;margin-bottom:0"><br>
</p>
<p style=3D"margin-top:0;margin-bottom:0">Richard Brittain (co-worker of mi=
ne)&nbsp;created a bug report&nbsp;(<span>#134572)&nbsp;</span>about this i=
ssue a few weeks ago.&nbsp; &nbsp;Until that sees some action,&nbsp;I'm won=
dering if anyone has ideas for a workaround.&nbsp;&nbsp;</p>
<p style=3D"margin-top:0;margin-bottom:0"><br>
</p>
<p style=3D"margin-top:0;margin-bottom:0">Essentially the problem is this.&=
nbsp; With the new 1.8 Linux client on Redhat 7.5, a user logs into a syste=
m with AFS homes and gets an error as part of the login process.&nbsp; For =
bash the error is:</p>
<p style=3D"margin-top:0;margin-bottom:0"><br>
</p>
<p style=3D"margin-top:0;margin-bottom:0"><span>shell-init: error retrievin=
g current directory: getcwd: cannot access parent directories: No such file=
 or directory</span><br>
</p>
<p style=3D"margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span>For tcsh I think it is:&nbs=
p;</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span></p>
<div>tcsh: No such file or directory</div>
<div><span style=3D"font-size: 12pt;">tcsh: Trying to start from /afs/path/=
to/home</span><br>
</div>
<br>
</span>
<p></p>
<p style=3D"margin-top:0;margin-bottom:0"><span>Regardless, the user has a =
valid token and has normal access to their home directory but some&nbsp;app=
lications generate the same error.&nbsp; Presumably it's&nbsp;apps that&nbs=
p;use getcwd since&nbsp;I can replicate it with a trivial C
 program using the getcwd system call.</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span>What's weird is that it is =
client specific and only affects a few specific directories and only on tha=
t client.&nbsp; In other words, user1 logging into machineA will always get=
 the error.&nbsp; user2 logging into machineA
 never does.&nbsp; user1 logging into machineB never does either.</span></p=
>
<p style=3D"margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span>A reboot fixed it but it is=
 very difficult to schedule a reboot (or any&nbsp;interruption of AFS)&nbsp=
;on&nbsp;these systems.&nbsp;&nbsp;I have tried fs flush, fs flushvolume, a=
nd fs flushmount using an affected home directory as the path
 argument and there is no change.&nbsp; I am relying on user feedback for t=
hat but it's from a user I trust.</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span>At the next scheduled reboo=
t we could go back to the 1.6 series where this issue was first reported (N=
ov/Dec 2017), and then fixed,&nbsp;but... well that kind of stinks.&nbsp; :=
-)</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span>Any ideas to fix it on a ru=
nning system?&nbsp; Anything useful&nbsp;I could do to flesh out the bug re=
port?&nbsp; Is the issue already&nbsp;well enough understood that we are ju=
st waiting for the next point release in the 1.8 series?</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span>&nbsp;- Bill</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span></p>
<div>--</div>
<div>William D. Hamblen</div>
<div>Research Systems Engineer</div>
<div>Research Computing,&nbsp;<span style=3D"font-size: 12pt;">Dartmouth Co=
llege</span></div>
<br>
</span>
<p></p>
</div>
</body>
</html>

--_000_BN6PR03MB2835FDADC598E79031B59FCBB6530BN6PR03MB2835namp_--