[OpenAFS] Zombie processes on AFS

Turbo Fredriksson turbo@bayour.com
09 Aug 2002 15:03:18 +0200


I got some processes that became zombies when entering a
directory on AFS...

----- s n i p -----
[papadoc.pts/0]$ ps | egrep 'find|/ls|dpkg'
26817 ?        S      0:03 /usr/bin/perl /usr/bin/dpkg-source -b bind9-9.2.1
27530 ?        Z      0:00 [find <defunct>]
29750 ?        S      0:00 find . -name *.o -exec rm -f {} ;
31406 ?        D      0:00 find . -name *.o -exec rm -f {} ;
31842 ?        D      0:00 find . -name *.o -exec rm -f {} ;
31846 ?        D      0:00 find
31889 ?        D      0:00 /bin/ls -CF idn/
 1207 ?        D      0:00 /bin/ls -CF
[papadoc.pts/0]$ ll /proc/31406/cwd
lrwxrwxrwx    1 turbo    turbo           0 Aug  9 14:48 /proc/31406/cwd -> /afs/bayour.com/user/fredriksson/turbo/src/DEBIAN/Misc/LDAPv3-ALL/bind9-9.2.1/contrib/
----- s n i p -----

Actually it's '.../contrib/idn/'...

The volume in question 'user.turbo' (/vicepb) is replicated to another
partition on the same server (-> 'user.turbo.readonly' on /vicepc).

I first thought that it was a problem with replication when I saw:

----- s n i p -----
Fri Aug  9 14:45:05 2002 trans 178 on volume 536870958 is older than 2010 seconds
Fri Aug  9 14:45:35 2002 trans 178 on volume 536870958 is older than 2040 seconds
Fri Aug  9 14:45:54 2002 1 Volser: Delete: volume 536870958 deleted
----- s n i p -----

and a BUNCH of exactly the same entries... I never saw the last line, so I
did 'vos release user.turbo'. It took about 40 minutes! I thought that
the replication is/was in realtime (!?).

The other question is 'can I somehow figure out WHY these process hung'
(the only parent to these processes is init, and I don't want to kill
that :)?


PS. There's no other dir that have problems to my knowledge...
-- 
Ortega FBI cryptographic jihad Serbian SDI quiche Kennedy SEAL Team 6
CIA 747 DES iodine bomb Clinton
[See http://www.aclu.org/echelonwatch/index.html for more about this]