[OpenAFS-devel] 1.2.9: Pthreads and signals combination is broken.

Derrick J Brashear shadow@dementia.org
Wed, 7 May 2003 16:34:03 -0400 (EDT)


On Tue, 6 May 2003, Harald Barth wrote:

> I believe the combination of signals and the pthreaded fileserver
> (tviced/fileserver) as shipped in 1.2.9 is broken. It might work on
> Linux but that might be a result of two defects canceling each other.

It might also work on Solaris. In fact it has a problem on Linux.

> bos restart. Nice core files are left to examine. When unwinding the
> asserts in softsig.c and turning off optimization, you can see that you
> get EINTR.
>
>   [EINTR]        The wait was interrupted by an unblocked, caught signal.
>
> If you compile softsig.c with -DTEST standalone you get the same
> effect: core dumped.

yay.

> On what architectures have you tested patch
> STABLE12-better-signal-thread-support-for-fileserver-20030113
> ?

"Not enough", apparently.

> I think the general idea was having one handler thread that triggers
> on SIGUSR1 only and all others trigging on all the other signals and
> then handling over control to the signal handling thread? If that is
> the case, softsig_thread() must be shielded from SIGINT, SIGXCPU ...

No, actually, the idea was that (except for fatal non-blockable signals)
only the softsig thread would take any.

> I think /afs/pdc.kth.se/home/h/haba/Public/openafs-pthread-signal.patch
> is a start, there might be some more signal blocking needed for the
> threads started form main(), but I'm not sure about that.