[OpenAFS] Zombie processes on AFS
Turbo Fredriksson
turbo@bayour.com
09 Aug 2002 15:03:18 +0200
I got some processes that became zombies when entering a
directory on AFS...
----- s n i p -----
[papadoc.pts/0]$ ps | egrep 'find|/ls|dpkg'
26817 ? S 0:03 /usr/bin/perl /usr/bin/dpkg-source -b bind9-9.2.1
27530 ? Z 0:00 [find <defunct>]
29750 ? S 0:00 find . -name *.o -exec rm -f {} ;
31406 ? D 0:00 find . -name *.o -exec rm -f {} ;
31842 ? D 0:00 find . -name *.o -exec rm -f {} ;
31846 ? D 0:00 find
31889 ? D 0:00 /bin/ls -CF idn/
1207 ? D 0:00 /bin/ls -CF
[papadoc.pts/0]$ ll /proc/31406/cwd
lrwxrwxrwx 1 turbo turbo 0 Aug 9 14:48 /proc/31406/cwd -> /afs/bayour.com/user/fredriksson/turbo/src/DEBIAN/Misc/LDAPv3-ALL/bind9-9.2.1/contrib/
----- s n i p -----
Actually it's '.../contrib/idn/'...
The volume in question 'user.turbo' (/vicepb) is replicated to another
partition on the same server (-> 'user.turbo.readonly' on /vicepc).
I first thought that it was a problem with replication when I saw:
----- s n i p -----
Fri Aug 9 14:45:05 2002 trans 178 on volume 536870958 is older than 2010 seconds
Fri Aug 9 14:45:35 2002 trans 178 on volume 536870958 is older than 2040 seconds
Fri Aug 9 14:45:54 2002 1 Volser: Delete: volume 536870958 deleted
----- s n i p -----
and a BUNCH of exactly the same entries... I never saw the last line, so I
did 'vos release user.turbo'. It took about 40 minutes! I thought that
the replication is/was in realtime (!?).
The other question is 'can I somehow figure out WHY these process hung'
(the only parent to these processes is init, and I don't want to kill
that :)?
PS. There's no other dir that have problems to my knowledge...
--
Ortega FBI cryptographic jihad Serbian SDI quiche Kennedy SEAL Team 6
CIA 747 DES iodine bomb Clinton
[See http://www.aclu.org/echelonwatch/index.html for more about this]