[OpenAFS-devel] afs_osi_Sleep and afs_osi_Wakeup on Linux
Srikanth Vishwanathan
vsrikanth@in.ibm.com
Mon, 3 Jun 2002 17:00:33 -0400
Hi -
There seems to be another race condition in these functions that
results on lost wake ups. I found this trying to debug a client
hang. Traces revealed that one of the threads was stuck in
afs_osi_Sleep() even though afs_osi_Wakeup() had been called.
The problem seems to be with this part of the code in function
afs_osi_Sleep():
while (seq == evp->seq) {
AFS_ASSERT_GLOCK();
AFS_GUNLOCK();
interruptible_sleep_on(&evp->cond);
AFS_GLOCK();
Apparently, operation of dropping the GLOCK and going to sleep
is not atomic, resulting in another thread being able to grab
GLOCK and calling wake_up before we really go to sleep. This
happens only on SMP machines.
When I changed this to the following, the problem went away. The
scheduler (called by sleep_on) automatically drops the kernel
lock if it is held; I guess this makes it atomic.
afs_osi_Sleep():
while (seq == evp->seq) {
AFS_ASSERT_GLOCK();
+ lock_kernel();
AFS_GUNLOCK();
interruptible_sleep_on(&evp->cond);
+ unlock_kernel();
AFS_GLOCK();
afs_osi_Wakeup():
if (evp->refcount > 1) {
evp->seq++;
+ lock_kernel();
wake_up(&evp->cond);
+ unlock_kernel();
}
Is there a better way to do this ? May be a Linux API ? Something
like Solaris's cv_wait(kcondvar_t *, kmutex_t *) would be nice.
Thanks,
Srikanth.