[OpenAFS] hung directory

Turbo Fredriksson turbo@bayour.com
25 Oct 2002 14:54:45 +0200


I'm running my web server (Roxen 2.1.265) in AFS space (v1.2.6 with
v1.2.3final2 kernel module).

Today I noticed that the web server didn't publish pages at all,
so I restarted and thought that would do it (I'm doing kinit/aklog)
from the init script). It didn't. When i did a 'ls' on the directory,
the command hung!

This have happened before a couple of months ago
(see https://lists.openafs.org/pipermail/openafs-info/2002-August/005509.html).

The machine have NOT been rebooted (it's now been up for 83 days)
so I'm quite satisfied with AFS. BUT... What's happening here?

----- s n i p -----
[papadoc.pts/24]$ ps | egrep ' D | Z '
27529 ?        Z      0:00 [gzip <defunct>]
27530 ?        Z      0:00 [find <defunct>]
31406 ?        D      0:00 find . -name *.o -exec rm -f {} ;
31842 ?        D      0:00 find . -name *.o -exec rm -f {} ;
31846 ?        D      0:00 find
31889 ?        D      0:00 /bin/ls -CF idn/
 1207 ?        D      0:00 /bin/ls -CF
 6331 ?        D      0:00 /USR/SBIN/CRON
 6332 ?        D      0:00 /USR/SBIN/CRON
 6333 ?        D      0:00 /USR/SBIN/CRON
10569 ?        D      0:00 rm -Rf build-tree stampdir stampdir/patch
10574 ?        D      0:00 /bin/ls -CF build-tree stampdir
 3647 ?        Z      0:01 [pike <defunct>]
  937 ?        D      0:00 /bin/bash
14176 ?        D      0:00 mv swenet swenet.old
14208 ?        D      0:00 /bin/ls -CF
14255 ?        D      0:00 /bin/ls -CF web/bayour/
----- s n i p -----

To these there's also some processes in Sleep mode (proably waiting
for the mother proces to finish). They can't be killed, and the
directory can't be accessed.

There's nothing in the logs about what could happened either..

The latest problem was pid 14255 and 14176...