[OpenAFS] strange failure mode in Linux when servers are unreachable

Noah Meyerhans noahm@csail.mit.edu
Tue, 27 Jul 2004 18:47:15 -0400

Hi all.  I've observed some behavior that is causing one of my users to
get rather grumpy and that has me a bit puzzled.

It started yesterday when one of our OpenAFS file servers crashed.  Even
after the server was brought back to life, one of the users whose home
directory was served by this system complained that a number of things
were not working correctly.  We noticed that her home directory had been
marked deleted by the OS.  In her xterms, the user had no trouble cd'ing
back to her home directory.  But for longer running processes (including
e.g. her X session), the directory appeared to no longer exist.  We were
able to confirm this with 'ls -l /proc/<pid_of_windowmanager>', which
indicated that the cwd link pointed to a deleted directory.  This caused
programs started from window manager menus to fail to start, or to start
with a CWD of /.  At this point, it seems that the only way to get her
environment back to a sane state if for her to terminate her X session
and begin a new one.

I find it strange that the user's home directory is marked deleted after
the fileserver is unreachable for a period (how long?) of time.  Is this
by design?  If so, why?  Is it possible to change this behavior?  It may
be less of a big deal, but since the NFS environment that we're
transitioning away from didn't have this problem, it makes the users
feel the we're forcing them into a less usable environment, and that's
never any fun...


Noah Meyerhans                         System Administrator
MIT Computer Science and Artificial Intelligence Laboratory