[OpenAFS] Re: Linux client, AFS homes, getcwd() failures, apparently deleted home directories

Andrew Deason adeason@sinenomine.net
Tue, 25 Jun 2013 15:28:49 -0500


On Tue, 25 Jun 2013 16:02:00 -0400 (EDT)
Richard Brittain <Richard.Brittain@dartmouth.edu> wrote:

> we have a strange problem on a large RHEL6 system with AFS home
> directories.  I'm not even sure if the problem is in the AFS cache
> manager or the kernel.

It's (almost certainly) us. Anders noted a similar thing in jabber a
week or so ago, and it's almost certainly due to the games we have to
play with linux dentries.

> The problem went away after the last reboot, but very temporarily.
> I've tried all the 'fs flush' variants, but nothing changed.  Our
> other RHEL6/AFS-home machines don't do this, and it has only affected
> a small number of users so far.

I'm not sure if our flushing commands will clear the relevant things
here; you can try 'echo 3 > /proc/sys/vm/drop_caches' which _might_ help
clear it up.

> Unfortunately the affected machine isn't one we can tinker with
> easily.

Well, the only way to get more useful information out of this is to
generate a vmcore of the machine while you're experiencing the problem,
or to run the 'crash' command and examine the various in-memory
structures with some specific commands. If you want to do something like
that, please say so, and I or someone else can come up with the
necessary info.

Or if you happen to find out a certain access or directory pattern that
creates this situation, that would help. I would assume that it is
possible to reach those directories via different paths / by traversing
different mountpoints, which is what may be causing the confusion.

-- 
Andrew Deason
adeason@sinenomine.net