[OpenAFS] OpenAFS 1.3.87 and 1.4.0-rc6 stability issues on Solaris 10

Loic Tortay tortay@cc.in2p3.fr
Thu, 13 Oct 2005 22:54:52 +0200


According to chas williams - CONTRACTOR:
> In message <20051012003336.GA4896@ccali22.in2p3.fr>,Loic Tortay writes:
> >"svcs -p" seems to be the tip of the iceberg, the machine also panics
> >with "ctstat -v" (whether AFS was started automatically or not).
>
> "dont do that"
>
Easier said than done. :-)

Both "svcs" and "ctstat" are non setuid binaries, and while it's easy
to prevent users to execute these two programs, there are cases when at
least "svcs -p" is useful especially for "root" or developers.

I guess there are probably other commands triggering the panic.

It's only a matter of time before some user tries one of these commands.

>
> it seems like this might be a bug in solaris10 when handling contracts
> of exiting chilren who have created kernel threads.  the rxlistener is
> a kernel thread on solaris and the child that starts the kernel_thread
> returns and exits.
>
> try this patch.
>
> it cleans up the child process and seems to help things (the listener
> thread seesm to join/attach to pid 0).
>
I applied your patch to 1.4.0-rc7 and it solves the problem, I have run
both "svcs -p" and "ctstat -v" one after the other about 11000 times.

Thank you so much.

I submitted a bug report for this problem this afternoon, it's the
ticket #22317.


Lo=EFc.
--=20
| Lo=EFc Tortay <tortay@cc.in2p3.fr> -     IN2P3 Computing Centre     |