[OpenAFS-devel] volserver hangs possible fix

Mon, 18 Apr 2005 08:43:55 -0400

On 4/12/2005 12:04, Jeffrey Hutzelman wrote:
> On Tuesday, April 12, 2005 11:11:29 AM -0400 Ted Anderson 
> <TedAnderson@mindspring.com> wrote:
> 
>> Just to make sure I understand this problem, let me try providing a
>> specific scenario:
>>      thread A        thread B
>>      --------        --------
>>   1:                 lock M;
>>   2:                 check queue;
>>   3:                 cond_wait(C,M);     // queue is empty
>>   4:                  -> kernel_scheduler {
>>   5:                         unlock M;
>>   6: lock M;                 ...
>>   7: add item;               ...
>>   8: unlock M;               ...
>>   9: broadcast on C;         ...         // noop, nobody blocked on C
>> 10:                         put B on C;
>> 11:                         desched B;
>> 12:          DEADLOCK
>>
>> Is this a problem that can occur with a corrently operating
>> implementation of POSIX threads, i.e. an example of correct but
>> non-deterministic scheduling?  I've expanded the operation of cond_wait
>> as I understand it.  I observe that M correctly protects the queue
>> itself.  Treating "broadcast C" and "put B on C" as black boxes, however,
>> it doesn't seem to help to move broadcast(C) inside M, i.e. by swapping 8
>> & 9.  On the other hand, tt would seem a simple matter to reverse the
>> sequence of 5 & 10, but perhaps that is inconvenient for some
>> multi-processor schedulers.
>>
>> As I understand your comment, you are saying that the broadcast must be
>> protected by the mutex or race conditions are possible.  What am I
>> missing here?
> 
> 
> What you're missing here is that cond_wait() is atomic.  The lines you 
> list as 5 and 10 happen at the same time, so it's not possible for 
> thread A to acquire the lock before thread B is on the wait-list for the 
> CV.  This is why it is necessary to hold the lock through the broadcast 
> in order to prevent a race -- otherwise, the broadcast could indeed 
> occur between lines 2 and 10, and thread B would miss the wakeup.

Well, I always thought cond_wait() was atomic.  But if that is true then 
I don't see why broadcast() needs to be covered by the mutex.  In the 
above example (ignoring the kernel_scheduler stuff), in thread A's 
execution the broadcast (9) follows the locked item addition (6-8).  So, 
either thread B goes first, gets added to the CV's queue and then gets 
awoken, or thread A goes first and there the item is added to the queue 
and B won't sleep.  How does moving broadcast() inside the mutex change 
anything?

What I was responding to way this:
On 4/11/2005 16:26, Horst Birthelmer wrote:
 > On Apr 11, 2005, at 9:21 PM, Tom Keiser wrote:
 >> It is a race condition.  pthread_cond_wait() is not an atomic
 >> operation; it relies on the mutex to synchronize a set of operations
 >> that enqueue the calling thread onto the list of threads blocked on
 >> the cond var, drop the associated lock, and become blocked by the
 >> kernel scheduler.  If you call signal/broadcast without holding the
 >> lock, you can race against these bookkeeping operations in
 >> pthread_cond_wait().  Thus, a thread in the process of blocking will
 >> never find out about that signal/broadcast, because cond vars are not
 >> stateful like semaphores.
 >
 > The mutex passed to a condition variable isn't for guarding the
 > bookkeeping, but that's another story, I guess.

It seemed that both Horst and Tom agreed that broadcast() should be 
protected, though neither was clear on the reason.  The quotes from 
POSIX didn't clarify it for me.  So what I tried to do was break out the 
sub-operations that must compose cond_wait() to try to see how 
protecting broadcast() with the mutex would help, but I couldn't 
construct a failing case.

Maybe one of you could outline a scenario where broadcast() must be 
protected to avoid a race condition?

Ted Anderson