[OpenAFS] OpenAFS 1.3.87 and 1.4.0-rc6 stability issues on Solaris 10

chas williams - CONTRACTOR chas@cmf.nrl.navy.mil
Thu, 13 Oct 2005 16:15:25 -0400


In message <20051012003336.GA4896@ccali22.in2p3.fr>,Loic Tortay writes:
>"svcs -p" seems to be the tip of the iceberg, the machine also panics
>with "ctstat -v" (whether AFS was started automatically or not).

"dont do that"

it seems like this might be a bug in solaris10 when handling contracts
of exiting chilren who have created kernel threads.  the rxlistener is
a kernel thread on solaris and the child that starts the kernel_thread
returns and exits.

try this patch.

it cleans up the child process and seems to help things (the listener
thread seesm to join/attach to pid 0).

  UID   PID  PPID  CTID COMMAND
    0   409     1    57 /usr/vice/etc/afsd -verbose -memcache -chunksize 15 -stat 2800 -dcache 2000 -da
    0   408     1    57 /usr/vice/etc/afsd -verbose -memcache -chunksize 15 -stat 2800 -dcache 2000 -da
    0   410     1    57 /usr/vice/etc/afsd -verbose -memcache -chunksize 15 -stat 2800 -dcache 2000 -da
    0   411     1    57 /usr/vice/etc/afsd -verbose -memcache -chunksize 15 -stat 2800 -dcache 2000 -da
    0   412     1    57 /usr/vice/etc/afsd -verbose -memcache -chunksize 15 -stat 2800 -dcache 2000 -da
    0   413     1    57 /usr/vice/etc/afsd -verbose -memcache -chunksize 15 -stat 2800 -dcache 2000 -da
    0   414     1    57 /usr/vice/etc/afsd -verbose -memcache -chunksize 15 -stat 2800 -dcache 2000 -da
    0   415     1    57 /usr/vice/etc/afsd -verbose -memcache -chunksize 15 -stat 2800 -dcache 2000 -da

57      0       process orphan  -       0       -       -       
        cookie:                0
        informative event set: core signal
        critical event set:    hwerr empty
        fatal event set:       hwerr
        parameter set:         none
        member processes:      0 408 409 410 411 412 413 414 415
        inherited contracts:   none


Index: src/afsd/afsd.c
===================================================================
RCS file: /cvs/openafs/src/afsd/afsd.c,v
retrieving revision 1.43.2.10
diff -u -u -r1.43.2.10 afsd.c
--- src/afsd/afsd.c	21 Jun 2005 20:13:52 -0000	1.43.2.10
+++ src/afsd/afsd.c	13 Oct 2005 19:54:42 -0000
@@ -78,6 +78,7 @@
 #include <errno.h>
 #include <sys/time.h>
 #include <dirent.h>
+#include <sys/wait.h>
 
 
 #ifdef HAVE_SYS_PARAM_H
@@ -1747,6 +1748,9 @@
 		     enable_process_stats);
 	exit(1);
     }
+#ifdef AFS_SUN510_ENV
+    waitpid((pid_t) -1, NULL, 0);
+#endif
 #endif
     if (afsd_verbose)
 	printf("%s: Forking rx callback listener.\n", rn);