[OpenAFS-devel] Same problem in a different place.
Matt Peterson
matt@caldera.com
Tue, 29 Jan 2002 17:14:25 -0700
Nickoali,
I was testing what I think is the latest code (osi_sleep.c rev 1.11) that
appears to have your change we've discussed on the list.
What I found is that there is still one more place we are seeing cpu hog. I
spent some time and tracked it all the way through a ton of code to rx_Read()
and eventually to CV_WAIT() in rx/LINUX/rx_kmutex.h.
In rx/rx_rdwr.c:225 there is a while() loop that "waits" for a in-sequence
datagram. It looks like the following:
while (call->flags & RX_CALL_READER_WAIT) {
#ifdef RX_ENABLE_LOCKS
CV_WAIT(&call->cv_rq, &call->lock);
#else
osi_rxSleep(&call->rq);
#endif
}
This loop produces the same behavior as the one you just fixed in
LINUX/osi_sleep.c in that any task that receives a signal when waiting for
rcpt of a packet will loop very tightly.
As you can imagine, this behavior is extremely easy to reproduce and often
manifests itself when users expect the CTRL-C keystroke (or SIGINT) to
terminate a cp, or ls command that is taking too long.
As a short term solution I'd recommend that you put the same fix you put in
LINUX/osi_sleep.c into the CV_WAIT() function in LINUX/rx_kmutex.h.
Still, a long term solution for both "signal during wait" problems needs
to be put into the code. I should have a few cycles in the next few days and
would be happy to help.
If you make the changes to add the additional parameter to the "sleep/wait"
functions as you proposed earlier
int afs_osi_Sleep(char *event, int aintrok);
and now...?
int CV_WAIT(afs_kcondvar_t *cv, afs_kmutex_t *l, int aintrok)
I can help scour the code for places that the parameters should be used so
that signals are handled in the appropriate places.
Actually, since afs_osi_Sleep() and CV_WAIT() are used so frequently, it
might be easier to make just make two new calls:
int afs_osi_SleepHonorSig(char *event, int aintrok);
int CV_WAIT_HONORSIG(afs_kcondvar_t *cv, afs_kmutex_t *l, int aintrok)
This way you could #define them to the existing (non-signal calls) for
non-linux platforms. Slowly as time and review permits you can replace the
old calls with the "HonorSig" calls in places where it is possible to return
the appropriate error code to userland.
--
Matt Peterson
Sr. Software Engineer
Caldera, Inc
matt@caldera.com