[OpenAFS] Zombie processes on AFS

Dave McMurtrie dgm+@pitt.edu
Fri, 09 Aug 2002 09:13:08 -0400 (EDT)


What's your dpkg-source perl script do?  Is it forking kids, but not
calling wait or waitpid?

-Dave
--
Dave McMurtrie, Systems Programmer
University of Pittsburgh
Computing Services and Systems Development,
Development Services -- UNIX and VMS Services
717P Cathedral of Learning
(412)-624-6413

On Fri, 9 Aug 2002, Turbo Fredriksson wrote:

> I got some processes that became zombies when entering a
> directory on AFS...
>
> ----- s n i p -----
> [papadoc.pts/0]$ ps | egrep 'find|/ls|dpkg'
> 26817 ?        S      0:03 /usr/bin/perl /usr/bin/dpkg-source -b bind9-9.2.1
> 27530 ?        Z      0:00 [find <defunct>]
> 29750 ?        S      0:00 find . -name *.o -exec rm -f {} ;
> 31406 ?        D      0:00 find . -name *.o -exec rm -f {} ;
> 31842 ?        D      0:00 find . -name *.o -exec rm -f {} ;
> 31846 ?        D      0:00 find
> 31889 ?        D      0:00 /bin/ls -CF idn/
>  1207 ?        D      0:00 /bin/ls -CF
> [papadoc.pts/0]$ ll /proc/31406/cwd
> lrwxrwxrwx    1 turbo    turbo           0 Aug  9 14:48 /proc/31406/cwd -> /afs/bayour.com/user/fredriksson/turbo/src/DEBIAN/Misc/LDAPv3-ALL/bind9-9.2.1/contrib/
> ----- s n i p -----
>
> Actually it's '.../contrib/idn/'...
>
> The volume in question 'user.turbo' (/vicepb) is replicated to another
> partition on the same server (-> 'user.turbo.readonly' on /vicepc).
>
> I first thought that it was a problem with replication when I saw:
>
> ----- s n i p -----
> Fri Aug  9 14:45:05 2002 trans 178 on volume 536870958 is older than 2010 seconds
> Fri Aug  9 14:45:35 2002 trans 178 on volume 536870958 is older than 2040 seconds
> Fri Aug  9 14:45:54 2002 1 Volser: Delete: volume 536870958 deleted
> ----- s n i p -----
>
> and a BUNCH of exactly the same entries... I never saw the last line, so I
> did 'vos release user.turbo'. It took about 40 minutes! I thought that
> the replication is/was in realtime (!?).
>
> The other question is 'can I somehow figure out WHY these process hung'
> (the only parent to these processes is init, and I don't want to kill
> that :)?
>
>
> PS. There's no other dir that have problems to my knowledge...
> --
> Ortega FBI cryptographic jihad Serbian SDI quiche Kennedy SEAL Team 6
> CIA 747 DES iodine bomb Clinton
> [See http://www.aclu.org/echelonwatch/index.html for more about this]
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>