[OpenAFS-devel] 1.2.9: Pthreads and signals combination is broken.

Harald Barth haba@pdc.kth.se
Tue, 06 May 2003 02:57:25 +0200 (CEST)


I believe the combination of signals and the pthreaded fileserver
(tviced/fileserver) as shipped in 1.2.9 is broken. It might work on
Linux but that might be a result of two defects canceling each other.
When running on dux (nowadays pronounced "HoPe tru64" :-), the
fileserver asserts in softsig.c:softsig_thread() each time you do a
bos restart. Nice core files are left to examine. When unwinding the
asserts in softsig.c and turning off optimization, you can see that you
get EINTR.

  [EINTR]        The wait was interrupted by an unblocked, caught signal.

If you compile softsig.c with -DTEST standalone you get the same
effect: core dumped.

On what architectures have you tested patch
STABLE12-better-signal-thread-support-for-fileserver-20030113
?
Besides the above issue, there might be compile problems on AIX4.2,
too. I think there is no pthread_sigmask() only sigthreadmask()
available.

I think the general idea was having one handler thread that triggers
on SIGUSR1 only and all others trigging on all the other signals and
then handling over control to the signal handling thread? If that is
the case, softsig_thread() must be shielded from SIGINT, SIGXCPU ...

I think /afs/pdc.kth.se/home/h/haba/Public/openafs-pthread-signal.patch
is a start, there might be some more signal blocking needed for the
threads started form main(), but I'm not sure about that.

Harald.