[OpenAFS] Linux client, AFS homes, getcwd() failures, apparently deleted home
directories
Richard Brittain
Richard.Brittain@dartmouth.edu
Tue, 25 Jun 2013 16:02:00 -0400 (EDT)
Hi
we have a strange problem on a large RHEL6 system with AFS home
directories. I'm not even sure if the problem is in the AFS cache manager
or the kernel. What happens is that for a small number of users, the
system seems to think that their home directory has been deleted.
Examining /proc/self/cwd for any process started from the home directory
will show a broken symlink, but the directory seems to be completely
normal if examined, and the same directory from every other client looks
normal.
The net effect is that programs started from the home directory, which
call getcwd(), may just quit when they get a failure from getcwd.
Luckily, bash and tcsh interactive shells seem to do this and then fall
back to an implicit 'cd $HOME'. This works, even though it ends up in the
same directory.
The problem went away after the last reboot, but very temporarily. I've
tried all the 'fs flush' variants, but nothing changed. Our other
RHEL6/AFS-home machines don't do this, and it has only affected a small
number of users so far.
Kernel: 2.6.32-358.6.2.el6.x86_64
openafs 1.6.2
Unfortunately the affected machine isn't one we can tinker with easily.
Richard
--
Richard Brittain, Research Computing Group,
Computing Services, 37 Dewey Field Road, HB6219
Dartmouth College, Hanover NH 03755
Richard.Brittain@dartmouth.edu 6-2085