[OpenAFS-devel] afs_osi_Sleep and afs_osi_Wakeup on Linux

Srikanth Vishwanathan vsrikanth@in.ibm.com
Mon, 3 Jun 2002 17:00:33 -0400


Hi -

There seems to be another race condition in these functions that
results on lost wake ups. I found this trying to debug a client
hang. Traces revealed that one of the threads was stuck in
afs_osi_Sleep() even though afs_osi_Wakeup() had been called.

The problem seems to be with this part of the code in function
afs_osi_Sleep():

while (seq == evp->seq) {
        AFS_ASSERT_GLOCK();
        AFS_GUNLOCK();
        interruptible_sleep_on(&evp->cond);
        AFS_GLOCK();

Apparently, operation of dropping the GLOCK and going to sleep
is not atomic, resulting in another thread being able to grab
GLOCK and calling wake_up before we really go to sleep. This
happens only on SMP machines.

When I changed this to the following, the problem went away. The
scheduler (called by sleep_on) automatically drops the kernel
lock if it is held; I guess this makes it atomic.

afs_osi_Sleep():

while (seq == evp->seq) {
        AFS_ASSERT_GLOCK();
+       lock_kernel();
        AFS_GUNLOCK();
        interruptible_sleep_on(&evp->cond);
+       unlock_kernel();
        AFS_GLOCK();


afs_osi_Wakeup():

   if (evp->refcount > 1) {
        evp->seq++;
+       lock_kernel();
        wake_up(&evp->cond);
+       unlock_kernel();
    }

Is there a better way to do this ? May be a Linux API ? Something
like Solaris's cv_wait(kcondvar_t *, kmutex_t *) would be nice.

Thanks,

Srikanth.