[OpenAFS-devel] volserver hangs possible fix
Jeffrey Hutzelman
jhutz@cmu.edu
Wed, 20 Apr 2005 10:54:17 -0400
On Monday, April 18, 2005 11:55:01 PM +0200 Horst Birthelmer
<horst@riback.net> wrote:
> So you agree with my initial posting that we didn't really have a problem
> here and that this is not the cause for those volserver hangs.
Yes, and no.
You originally said that not holding the mutex when calling cond_broadcast
does not introduce a race. I agree with that statement.
However, there actually is a race, because the FSYNC lock is not held
continuously from when BreakLaterCallBacks() decides there is no more work
to do until FsyncCheckLWP() calls pthread_cond_timedwait. So it is
possible for more callbacks to be added and the broadcast to be sent during
this window, which will result in no work being done until Fsync_CheckLWP
wakes up on the 5 minute timeout.
In practice, I think that race should be uncommon, but I haven't worked out
how likely the various possible scheduling variants are.
-- Jeff