[OpenAFS] getcwd() error for RHEL 7.4 kernel

Garance A Drosehn drosih@rpi.edu
Fri, 17 Nov 2017 17:35:25 -0500

On 18 Oct 2017, at 19:21, Benjamin Kaduk wrote:

> On Tue, Oct 17, 2017 at 11:55:27AM -0400, Jacob Bonek wrote:
>> This is a major issue that has caused us to have to stay at the 
>> latest
>> pre-RHEL 7.4 kernel for a long time now while this issue has existed.
>> This may be related to previous issues with getcwd() but something in
>> the RHEL 7.4 kernel seems to have made it much worse.

>> Has anyone else experienced this issue with RHEL 7.4? Is there 
>> anything
>> that we can do to narrow down what is causing this?
> I think we've seen another report or two, but it's always been hard to
> reproduce.  That said, with the specifics you've offered about the
> kernel version that introduced the issue, we've got a couple folks
> trying to reproduce in a controlled environment.

I'm seeing this (a little), but haven't had time to look into it.  But
here's some thoughts/observations:

I have three RHEL systems, all currently running:

kernel.x86_64             3.10.0-693.el7

They're all running the exact same build of OpenAFS, because I built
it on a different system, created RPM's, and installed the exact same
RPM's on all three systems.

kmod-openafs.x86_64       1.6.21-
openafs.x86_64            1.6.21-1.el7
openafs-client.x86_64     1.6.21-1.el7
openafs-docs.x86_64       1.6.21-1.el7
openafs-krb5.x86_64       1.6.21-1.el7

These are three remote-access machines for RPI users, so the intent is
that they should be exactly the same.  I'm sure there are some minor
changes, but at least for the kernel and openafs modules they are
definitely the same.

On one of them, if I log in to my userid and 'sudo bash', I get a lot
of messages like:

shell-init: error retrieving current directory: \
             getcwd: cannot access parent directories: No such file or 
job-working-directory: error retrieving current directory: \
             getcwd: cannot access parent directories: No such file or 

I've only seen this if the active working directory is my home 
It won't happen if I 'cd' into some sub-directory under my home 
before I do the 'sudo bash'.

This seems to always happen on one of the three machines.  It never 
on a second machine, and it *sometimes* happens on the third machine.  
"sometimes", I mean that some days it never happens, but other days it 
to happen all the time.  I have not seen the problem right at login, but 
if I do a 'sudo bash' while in my home directory at any time after I 
logged in.

Once I have done the 'sudo bash', I can then 'cd' into the home 
of my original userid and there are no error messages.

These machines are used by maybe 100 different people.  I have not heard 
anyone who has seen these error messages when they login, but we do have
some users who never report errors as long as they can get their work 
And of course, I'm the only one who would be doing 'sudo' commands on 

I wonder if it has to do with the home directory being an AFS mount 
(as opposed to a standard directory somewhere inside an AFS volume), but 
have not had the time to do any tests of that idea.

The fact that I don't see the same behavior on all three machines makes 
wonder if it has to do with how much the other users have been doing.  
they've used up more of the local AFS cache on some machines than 
I haven't had the chance to reboot any of these machines for a few 
now, but I hope to do that over the long thanksgiving weekend.  Given 
errors seen at some other sites, I probably won't upgrade the kernel or
version of OpenAFS until the semester break.

Garance Alistair Drosehn                =     drosih@rpi.edu
Senior Systems Programmer               or   gad@FreeBSD.org
Rensselaer Polytechnic Institute;             Troy, NY;  USA