[OpenAFS] OpenAFS 1.8.0 on Linux, getcwd failure
William D. Hamblen
William.D.Hamblen@dartmouth.edu
Wed, 18 Jul 2018 19:48:02 +0000
--_000_BN6PR03MB2835FDADC598E79031B59FCBB6530BN6PR03MB2835namp_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Hi,
Richard Brittain (co-worker of mine) created a bug report (#134572) about t=
his issue a few weeks ago. Until that sees some action, I'm wondering if =
anyone has ideas for a workaround.
Essentially the problem is this. With the new 1.8 Linux client on Redhat 7=
.5, a user logs into a system with AFS homes and gets an error as part of t=
he login process. For bash the error is:
shell-init: error retrieving current directory: getcwd: cannot access paren=
t directories: No such file or directory
For tcsh I think it is:
tcsh: No such file or directory
tcsh: Trying to start from /afs/path/to/home
Regardless, the user has a valid token and has normal access to their home =
directory but some applications generate the same error. Presumably it's a=
pps that use getcwd since I can replicate it with a trivial C program using=
the getcwd system call.
What's weird is that it is client specific and only affects a few specific =
directories and only on that client. In other words, user1 logging into ma=
chineA will always get the error. user2 logging into machineA never does. =
user1 logging into machineB never does either.
A reboot fixed it but it is very difficult to schedule a reboot (or any int=
erruption of AFS) on these systems. I have tried fs flush, fs flushvolume,=
and fs flushmount using an affected home directory as the path argument an=
d there is no change. I am relying on user feedback for that but it's from=
a user I trust.
At the next scheduled reboot we could go back to the 1.6 series where this =
issue was first reported (Nov/Dec 2017), and then fixed, but... well that k=
ind of stinks. :-)
Any ideas to fix it on a running system? Anything useful I could do to fle=
sh out the bug report? Is the issue already well enough understood that we=
are just waiting for the next point release in the 1.8 series?
- Bill
--
William D. Hamblen
Research Systems Engineer
Research Computing, Dartmouth College
--_000_BN6PR03MB2835FDADC598E79031B59FCBB6530BN6PR03MB2835namp_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<html>
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-=
1">
<style type=3D"text/css" style=3D"display:none;"><!-- P {margin-top:0;margi=
n-bottom:0;} --></style>
</head>
<body dir=3D"ltr">
<div id=3D"divtagdefaultwrapper" style=3D"font-size: 12pt; color: rgb(0, 0,=
0); font-family: Helvetica, EmojiFont, "Apple Color Emoji", &quo=
t;Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "=
Android Emoji", EmojiSymbols;" dir=3D"ltr">
<p style=3D"margin-top:0;margin-bottom:0">Hi,</p>
<p style=3D"margin-top:0;margin-bottom:0"><br>
</p>
<p style=3D"margin-top:0;margin-bottom:0">Richard Brittain (co-worker of mi=
ne) created a bug report (<span>#134572) </span>about this i=
ssue a few weeks ago. Until that sees some action, I'm won=
dering if anyone has ideas for a workaround. </p>
<p style=3D"margin-top:0;margin-bottom:0"><br>
</p>
<p style=3D"margin-top:0;margin-bottom:0">Essentially the problem is this.&=
nbsp; With the new 1.8 Linux client on Redhat 7.5, a user logs into a syste=
m with AFS homes and gets an error as part of the login process. For =
bash the error is:</p>
<p style=3D"margin-top:0;margin-bottom:0"><br>
</p>
<p style=3D"margin-top:0;margin-bottom:0"><span>shell-init: error retrievin=
g current directory: getcwd: cannot access parent directories: No such file=
or directory</span><br>
</p>
<p style=3D"margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span>For tcsh I think it is:&nbs=
p;</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span></p>
<div>tcsh: No such file or directory</div>
<div><span style=3D"font-size: 12pt;">tcsh: Trying to start from /afs/path/=
to/home</span><br>
</div>
<br>
</span>
<p></p>
<p style=3D"margin-top:0;margin-bottom:0"><span>Regardless, the user has a =
valid token and has normal access to their home directory but some app=
lications generate the same error. Presumably it's apps that&nbs=
p;use getcwd since I can replicate it with a trivial C
program using the getcwd system call.</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span>What's weird is that it is =
client specific and only affects a few specific directories and only on tha=
t client. In other words, user1 logging into machineA will always get=
the error. user2 logging into machineA
never does. user1 logging into machineB never does either.</span></p=
>
<p style=3D"margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span>A reboot fixed it but it is=
very difficult to schedule a reboot (or any interruption of AFS) =
;on these systems. I have tried fs flush, fs flushvolume, a=
nd fs flushmount using an affected home directory as the path
argument and there is no change. I am relying on user feedback for t=
hat but it's from a user I trust.</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span>At the next scheduled reboo=
t we could go back to the 1.6 series where this issue was first reported (N=
ov/Dec 2017), and then fixed, but... well that kind of stinks. :=
-)</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span>Any ideas to fix it on a ru=
nning system? Anything useful I could do to flesh out the bug re=
port? Is the issue already well enough understood that we are ju=
st waiting for the next point release in the 1.8 series?</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span> - Bill</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style=3D"margin-top:0;margin-bottom:0"><span></p>
<div>--</div>
<div>William D. Hamblen</div>
<div>Research Systems Engineer</div>
<div>Research Computing, <span style=3D"font-size: 12pt;">Dartmouth Co=
llege</span></div>
<br>
</span>
<p></p>
</div>
</body>
</html>
--_000_BN6PR03MB2835FDADC598E79031B59FCBB6530BN6PR03MB2835namp_--