[OpenAFS-devel] 1.2.9: Pthreads and signals combination is

Nickolai Zeldovich kolya@MIT.EDU
Fri, 9 May 2003 18:34:26 -0400 (EDT)


Sorry it's taken me a while to catch up on this thread..  So the original
reason for using sigwait() in softsig_thread was actually portability, as
you could use this same mechanism on both Solaris and Linux (the two test
systems I was working with at the time).  It looks like non-portability to
DUX is just a lack of POSIX support that can be easily fixed, as Haba has
pointed out (dropping the assert around sigwait, or checking for EINTR).

Note that it's not necessary to block SIGUSR1 in threads other than the
softsig_thread.  When a softsig signal is handled by some thread, it calls
sogsift_handler(), which registes the event in softsig_sigs[] and then
sends SIGUSR1 to softsig_thread directly: note the use of pthread_kill()
as opposed to kill().

One problem that I didn't expect is that sigwait() blocks SIGSEGV et al,
so on Linux the softsig_thread doesn't die when the rest of the process
crashes.  This is kind-of unfortunate.  Another problem is that, again on
Linux, you can't send signals to the softsig_thread PID of the process,
you have to use the PID of some other thread.

The latter part of the problem I explicitly punted on when implementing
softsig:  the bosserver will know the main thread's PID, and will send
signals there, and using killall will send the signal to all threads, so
that's OK too.  Picking an arbitrary thread out of `ps ax` output and
sending the signal there became kind-of "unsupported".

I think these two problems can actually be solved in a fairly simple way.
Basically, pass a full sigset into sigwait(), so that sigwait will return
on any signal.  After sigwait() returns, we post the received signal into
the softsig_sigs[] array, if that signal was registered, and continue.
This means of course that we'll need to block all signals in that thread,
and possibly also manually panic on receiving SIGSEGV et al.

-- kolya