[OpenAFS-devel] volserver hangs possible fix

Jeffrey Hutzelman jhutz@cmu.edu
Tue, 12 Apr 2005 12:04:23 -0400


On Tuesday, April 12, 2005 11:11:29 AM -0400 Ted Anderson 
<TedAnderson@mindspring.com> wrote:

> Just to make sure I understand this problem, let me try providing a
> specific scenario:
>      thread A        thread B
>      --------        --------
>   1:                 lock M;
>   2:                 check queue;
>   3:                 cond_wait(C,M);     // queue is empty
>   4:                  -> kernel_scheduler {
>   5:                         unlock M;
>   6: lock M;                 ...
>   7: add item;               ...
>   8: unlock M;               ...
>   9: broadcast on C;         ...         // noop, nobody blocked on C
> 10:                         put B on C;
> 11:                         desched B;
> 12:          DEADLOCK
>
> Is this a problem that can occur with a corrently operating
> implementation of POSIX threads, i.e. an example of correct but
> non-deterministic scheduling?  I've expanded the operation of cond_wait
> as I understand it.  I observe that M correctly protects the queue
> itself.  Treating "broadcast C" and "put B on C" as black boxes, however,
> it doesn't seem to help to move broadcast(C) inside M, i.e. by swapping 8
> & 9.  On the other hand, tt would seem a simple matter to reverse the
> sequence of 5 & 10, but perhaps that is inconvenient for some
> multi-processor schedulers.
>
> As I understand your comment, you are saying that the broadcast must be
> protected by the mutex or race conditions are possible.  What am I
> missing here?

What you're missing here is that cond_wait() is atomic.  The lines you list 
as 5 and 10 happen at the same time, so it's not possible for thread A to 
acquire the lock before thread B is on the wait-list for the CV.  This is 
why it is necessary to hold the lock through the broadcast in order to 
prevent a race -- otherwise, the broadcast could indeed occur between lines 
2 and 10, and thread B would miss the wakeup.

-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
   Sr. Research Systems Programmer
   School of Computer Science - Research Computing Facility
   Carnegie Mellon University - Pittsburgh, PA