[OpenAFS-devel] volserver hangs possible fix

Horst Birthelmer horst@riback.net
Mon, 18 Apr 2005 14:58:58 +0200


On Apr 18, 2005, at 2:43 PM, Ted Anderson wrote:

> On 4/12/2005 12:04, Jeffrey Hutzelman wrote:
>> On Tuesday, April 12, 2005 11:11:29 AM -0400 Ted Anderson 
>> <TedAnderson@mindspring.com> wrote:
>>> Just to make sure I understand this problem, let me try providing a
>>> specific scenario:
>>>      thread A        thread B
>>>      --------        --------
>>>   1:                 lock M;
>>>   2:                 check queue;
>>>   3:                 cond_wait(C,M);     // queue is empty
>>>   4:                  -> kernel_scheduler {
>>>   5:                         unlock M;
>>>   6: lock M;                 ...
>>>   7: add item;               ...
>>>   8: unlock M;               ...
>>>   9: broadcast on C;         ...         // noop, nobody blocked on C
>>> 10:                         put B on C;
>>> 11:                         desched B;
>>> 12:          DEADLOCK
>>>
>>> Is this a problem that can occur with a corrently operating
>>> implementation of POSIX threads, i.e. an example of correct but
>>> non-deterministic scheduling?  I've expanded the operation of 
>>> cond_wait
>>> as I understand it.  I observe that M correctly protects the queue
>>> itself.  Treating "broadcast C" and "put B on C" as black boxes, 
>>> however,
>>> it doesn't seem to help to move broadcast(C) inside M, i.e. by 
>>> swapping 8
>>> & 9.  On the other hand, tt would seem a simple matter to reverse the
>>> sequence of 5 & 10, but perhaps that is inconvenient for some
>>> multi-processor schedulers.
>>>
>>> As I understand your comment, you are saying that the broadcast must 
>>> be
>>> protected by the mutex or race conditions are possible.  What am I
>>> missing here?
>> What you're missing here is that cond_wait() is atomic.  The lines 
>> you list as 5 and 10 happen at the same time, so it's not possible 
>> for thread A to acquire the lock before thread B is on the wait-list 
>> for the CV.  This is why it is necessary to hold the lock through the 
>> broadcast in order to prevent a race -- otherwise, the broadcast 
>> could indeed occur between lines 2 and 10, and thread B would miss 
>> the wakeup.
>
> Well, I always thought cond_wait() was atomic.  But if that is true 
> then I don't see why broadcast() needs to be covered by the mutex.  In 
> the above example (ignoring the kernel_scheduler stuff), in thread A's 
> execution the broadcast (9) follows the locked item addition (6-8).  
> So, either thread B goes first, gets added to the CV's queue and then 
> gets awoken, or thread A goes first and there the item is added to the 
> queue and B won't sleep.  How does moving broadcast() inside the mutex 
> change anything?
>
> What I was responding to way this:
> On 4/11/2005 16:26, Horst Birthelmer wrote:
> > On Apr 11, 2005, at 9:21 PM, Tom Keiser wrote:
> >> It is a race condition.  pthread_cond_wait() is not an atomic
> >> operation; it relies on the mutex to synchronize a set of operations
> >> that enqueue the calling thread onto the list of threads blocked on
> >> the cond var, drop the associated lock, and become blocked by the
> >> kernel scheduler.  If you call signal/broadcast without holding the
> >> lock, you can race against these bookkeeping operations in
> >> pthread_cond_wait().  Thus, a thread in the process of blocking will
> >> never find out about that signal/broadcast, because cond vars are 
> not
> >> stateful like semaphores.
> >
> > The mutex passed to a condition variable isn't for guarding the
> > bookkeeping, but that's another story, I guess.
>
> It seemed that both Horst and Tom agreed that broadcast() should be 
> protected, though neither was clear on the reason.  The quotes from 
> POSIX didn't clarify it for me.  So what I tried to do was break out 
> the sub-operations that must compose cond_wait() to try to see how 
> protecting broadcast() with the mutex would help, but I couldn't 
> construct a failing case.
>
> Maybe one of you could outline a scenario where broadcast() must be 
> protected to avoid a race condition?
>

The problem isn't whether cond_wait is atomic. It's what happens to the 
algorithm if it's not.
Imagine the scenario where it's not atomic (and this was the part where 
I agreed with Tom) and you have the mutex locked in the cond_wait call, 
but the thread isn't  in the queue yet.
Now this thread gets interrupted by whatever event and you perform a 
...cond_broadcast(). All the threads are woken up except that one not 
in there yet. You have a thread waiting a cond_var you weren't aware 
of... actually you have that thread waiting on that condition variable 
while you already performed a broadcast. That's pretty weird for the 
algorithm.

Now here's the part where I didn't agree with Tom.
That's no raise condition from my point of view and it will not fix 
that volserver issue entirely but the code is more safe if you acquire 
the mutex before performing the broadcast. You just can't get that 
condition in that way.

Remember for that to happen you have to be in the critical section in 
the code performing the wait and you can't enter with your broadcast 
thread until it's safe.

Horst