[OpenAFS-devel] volserver hangs possible fix
Jeffrey Hutzelman
jhutz@cmu.edu
Mon, 18 Apr 2005 17:30:45 -0400
On Monday, April 18, 2005 10:04:45 PM +0200 Horst Birthelmer
<horst@riback.net> wrote:
> That's one passage I didn't post in my last postings, which actually
> started the fire... ;-)
> I still don't see the confusion. It's sort of what I said in the first
> place.
> You still can hold the mutex but miss the broadcast and wait forever
> there ...
Well, one bit of confusion is that people keep talking about how it doesn't
work if pthread_cond_wait is not atomic. That's not a problem, because
pthread_cond_wait is NEVER not atomic. It is ALWAYS atomic.
> That's one point the other is, you can be in the critical section with
> one thread and broadcasting the others,
> which as I pointed out for I have no idea how many times now, is _not_ a
> race condition.
Sure you can, but never in a situation where it matters.
Suppose again that thread A is the broadcasting thread, and thread B is the
waiter thread that we are interested in.
Now, in the example under discussion, thread A looks like this:
{
...
acquire mutex
update queue
release mutex
cond_broadcast
...
}
And thread B looks like this:
acquire mutex
while (1) {
while (queue is not empty) {
pop work from queue
release mutex
do work
acquire mutex
}
cond_wait
}
Note that thread B must release the mutex to do work, but calls cond_wait
only if it has observed the queue to be empty since the mutex was last
acquired. So, I see about three possible cases:
Case I - Everything happens in the expected order:
Thread A Thread B
acquire mutex
queue is empty
cond_wait -> SLEEP (with release)
acquire mutex
add item N
release mutex
cond_broadcast WAKEUP (with acquire)
queue is not empty
pop item N from queue
release mutex
process item N
acquire mutex
queue is empty
cond_wait -> SLEEP (with release)
Case II - Not really a deadlock
acquire mutex
add item N
release mutex
acquire mutex
queue is not empty
pop item N from queue
release mutex
process item N
acquire mutex
queue is empty
cond_broadcast NO EFFECT
cond_wait -> SLEEP (with release)
Case III - Also OK
acquire mutex
add item N
release mutex
acquire mutex
queue is not empty
pop item N from queue
release mutex
process item N
acquire mutex
queue is empty
cond_wait -> SLEEP (with release)
cond_broadcast WAKEUP (with acquire)
queue is empty
cond_wait -> SLEEP (with release)
Note that it is possible (as in case II) for the cond_broadcast to have no
effect on thread B, because it is not in cond_wait yet. But it is not
possible for this to result in item N not being processed, because thread B
will never call cond_wait unless it has observed an empty queue since last
acquiring the mutex.
> There still is this theoretical possibility where the thread will be
> waiting forever on the cv, but let's put that aside.
Not while there's work in the queue; see above.
-- Jeff