[OpenAFS-devel] 1.2.9: Pthreads and signals combination is broken.

Harald Barth haba@pdc.kth.se
Thu, 08 May 2003 00:24:59 +0200 (CEST)


> > I think the general idea was having one handler thread that triggers
> > on SIGUSR1 only and all others trigging on all the other signals and
> > then handling over control to the signal handling thread? If that is
> > the case, softsig_thread() must be shielded from SIGINT, SIGXCPU ...
> 
> No, actually, the idea was that (except for fatal non-blockable signals)
> only the softsig thread would take any.

Now you got me really confused :-) My feeling is that the 1.2.9 code 
was written with the intent that all threads take signals (to please
the Linux thread model). 1.2.6 code seems to be written with the POSIX
model in mind where any thread can take a signal.

I think we have 2 models here: 

1) The POSIX model where a signal is delivered to the first thread in
which the signal is not blocked. Then this process takes care of the
signal and executes the action which was registered by signal()
earlier.

2) The current Linux model (*) where a signal is delivered to the thread
which happens to be active just now and then the blocking status is
checked if the action registered by signal() is performed. If the thread
has the signal in question blocked, it will not be delivered.

Then we have the OpenAFS fileserver versions handling these scenarios.

1.2.6: Performs well under model 1) but might have concurrency issues
with interrupted system calls. Under 2) signals are probably lost.

1.2.9: Asserts under both models if an signal which is not SIGUSR1
happens to be delivered to the signal thread. If that happens depends
in model 1) on timing and in model 2) on internal signal to thread
delivery.

1.2.9+habapatch: Should perform well under model 1) and might perform
acceptable under model 2) if the internal signal to thread delivery
does not decide to deliver to the signal thread.

Only-softsig-thread-takes-signals: Should solve conurrency problems
under model 1). Will probably have problems with missed signals
under 2). Can be implemented much simpler as the current "send signal
2 times" implementation.

So what about taking 1.2.9+habapatch where all threads but the signal
thread take signals and extend that one so that that one takes the
other signals, too? To do that, softsig_signal() must after
registering the signal do a pthread_kill (softsig_tid, SIGUSR1) and on
that the signal thread awakes and updates the signal mask it waits on.
Which bits to wait for in addition to SIGUSR1 can be deducted from the
softsig_sigs[signo].handler array. Then sigwait() puts the therad to
sleep again. Then when a "real" signal arrives (not SIGUSR1) on exit
from sigwait() softsig_sigs[i].pending is set and that will be taken
care of in the next turn of the while(1) loop.

Whatdoyouthink?

Harald.

(*) There is work going on to make Linux pthreads more like POSIX pthreads.
I heard that the linux threads in 2.5 are more POSIXish and there is NGPT
http://www-124.ibm.com/developerworks/oss/pthreads/ Unfortunately NGPT needs
a kernel patch to work and even if it has crossed my mind, the thought of
having the prereq of a kernel patch is not appealing at all.