[OpenAFS] Linux client, AFS homes, getcwd() failures, apparently deleted home directories

Richard Brittain Richard.Brittain@dartmouth.edu
Tue, 25 Jun 2013 16:02:00 -0400 (EDT)


Hi
  we have a strange problem on a large RHEL6 system with AFS home 
directories.  I'm not even sure if the problem is in the AFS cache manager 
or the kernel.  What happens is that for a small number of users, the 
system seems to think that their home directory has been deleted. 
Examining /proc/self/cwd for any process started from the home directory 
will show a broken symlink, but the directory seems to be completely 
normal if examined, and the same directory from every other client looks 
normal.

The net effect is that programs started from the home directory, which 
call getcwd(), may just quit when they get a failure from getcwd. 
Luckily, bash and tcsh interactive shells seem to do this and then fall 
back to an implicit 'cd $HOME'.  This works, even though it ends up in the 
same directory.

The problem went away after the last reboot, but very temporarily.  I've 
tried all the 'fs flush' variants, but nothing changed.  Our other 
RHEL6/AFS-home machines don't do this, and it has only affected a small 
number of users so far.

Kernel: 2.6.32-358.6.2.el6.x86_64
openafs 1.6.2

Unfortunately the affected machine isn't one we can tinker with easily.

Richard
-- 
Richard Brittain,  Research Computing Group,
                    Computing Services, 37 Dewey Field Road, HB6219
                    Dartmouth College, Hanover NH 03755
Richard.Brittain@dartmouth.edu 6-2085