[OpenAFS] OpenAFS 1.3.87 and 1.4.0-rc6 stability issues on Solaris 10
chas williams - CONTRACTOR
chas@cmf.nrl.navy.mil
Thu, 13 Oct 2005 16:15:25 -0400
In message <20051012003336.GA4896@ccali22.in2p3.fr>,Loic Tortay writes:
>"svcs -p" seems to be the tip of the iceberg, the machine also panics
>with "ctstat -v" (whether AFS was started automatically or not).
"dont do that"
it seems like this might be a bug in solaris10 when handling contracts
of exiting chilren who have created kernel threads. the rxlistener is
a kernel thread on solaris and the child that starts the kernel_thread
returns and exits.
try this patch.
it cleans up the child process and seems to help things (the listener
thread seesm to join/attach to pid 0).
UID PID PPID CTID COMMAND
0 409 1 57 /usr/vice/etc/afsd -verbose -memcache -chunksize 15 -stat 2800 -dcache 2000 -da
0 408 1 57 /usr/vice/etc/afsd -verbose -memcache -chunksize 15 -stat 2800 -dcache 2000 -da
0 410 1 57 /usr/vice/etc/afsd -verbose -memcache -chunksize 15 -stat 2800 -dcache 2000 -da
0 411 1 57 /usr/vice/etc/afsd -verbose -memcache -chunksize 15 -stat 2800 -dcache 2000 -da
0 412 1 57 /usr/vice/etc/afsd -verbose -memcache -chunksize 15 -stat 2800 -dcache 2000 -da
0 413 1 57 /usr/vice/etc/afsd -verbose -memcache -chunksize 15 -stat 2800 -dcache 2000 -da
0 414 1 57 /usr/vice/etc/afsd -verbose -memcache -chunksize 15 -stat 2800 -dcache 2000 -da
0 415 1 57 /usr/vice/etc/afsd -verbose -memcache -chunksize 15 -stat 2800 -dcache 2000 -da
57 0 process orphan - 0 - -
cookie: 0
informative event set: core signal
critical event set: hwerr empty
fatal event set: hwerr
parameter set: none
member processes: 0 408 409 410 411 412 413 414 415
inherited contracts: none
Index: src/afsd/afsd.c
===================================================================
RCS file: /cvs/openafs/src/afsd/afsd.c,v
retrieving revision 1.43.2.10
diff -u -u -r1.43.2.10 afsd.c
--- src/afsd/afsd.c 21 Jun 2005 20:13:52 -0000 1.43.2.10
+++ src/afsd/afsd.c 13 Oct 2005 19:54:42 -0000
@@ -78,6 +78,7 @@
#include <errno.h>
#include <sys/time.h>
#include <dirent.h>
+#include <sys/wait.h>
#ifdef HAVE_SYS_PARAM_H
@@ -1747,6 +1748,9 @@
enable_process_stats);
exit(1);
}
+#ifdef AFS_SUN510_ENV
+ waitpid((pid_t) -1, NULL, 0);
+#endif
#endif
if (afsd_verbose)
printf("%s: Forking rx callback listener.\n", rn);