[OpenAFS-devel] Very simple patch for libafs CPU hog on signal

Nickolai Zeldovich kolya@MIT.EDU
Tue, 29 Jan 2002 13:55:32 -0500


> Question: Is there any reason why you could not save off current->blocked and
> restore it outside of the while() loop?

The reason for putting it inside the loop was to avoid any potential
deadlock with GLOCK & sigmask_lock, by doing all signal operations
without GLOCK.  I suspect it doesn't really matter in practice, since
it seems unlikely that evp->cond will be woken up without evp->seq
being incremented.

> > Unfortunately, this has the side effect of artificially increasing the 
> > system load to ~4 when AFS is idle, because the background daemons 
> > block waiting for requests.
> 
> Is Nickolai is right on this one?  Tasks with a state of TASK_INTERRUPTIBLE 
> do not appear to increase the system load.  If they did, you'd already see an
> increase of 4 as the daemon syscalls currently spend most of their time in 
> interruptable_sleep_on() (even without Nickolai's patch).  Basically, I don't
> see how Nickolai's patch would change system load.

I was actually referring to the TASK_UNINTERRUPTIBLE state -- these tasks
do contribute to the system load.  If not for this artificial problem,
using sleep_on() instead of interruptible_sleep_on() in afs_osi_Sleep
would have been a perfectly good solution too.

> Ultimately, the fact remains that there is absolutely NO WAY to get out of 
> afs_osi_Sleep() via a signal -- even with Nickolai's patch and regardless of 
> whether or not the caller of afs_osi_Sleep() has the capability of letting 
> the signal be handled by a userland caller.

Correct; I think in order to address this issue, afs_osi_Sleep() will
need to be changed to something like

  int afs_osi_Sleep(char *event, int aintrok);

which would return EINTR if it was interrupted and aintrok was 1.

> The interactions of the various daemon tasks and the "normal" VFS syscalls 
> are fairly complex and I have not traced through the code long enough to 
> understand them all.  If I make a call to write() which eventually (via the 
> VFS calls) lands in a afs_osi_Sleep() is there any way that I can get the 
> behavior that I want -- where the VFS syscall can be immediataly interrupted 
> via a SIGINT?  Or do I have to wait until some afsd task wakes my sleeping 
> syscall?

It depends on what it's sleeping on.  For instance, if it's sleeping
in Rx code waiting for the server to acknowledge the data sent to it,
it'll only wake up when the RxListener afsd process receives the Rx
acknowledgement and wakes up your process (or until the Rx timeout
expires and the server is declared to be dead -- this wakeup will
come from the RxEvent afsd process).

> From what I've seen of the code (again I've only looked at it for a few 
> hours), I get the feeling that in order to make the code signal safe, we will
> have to remove the while(seq == evp->seq) loop from afs_osi_Sleep().  
> 
> Each function that calls afs_osi_Sleep() (and there are many) needs to call 
> signal_pending() to see if the sleep was interrupted by a signal -- then 
> either ignore the signal or return from the syscall.

Doing it exactly like this has the obvious problem of not being
portable.  Also, I suspect there are many places where it's hard
to gracefully handle a signal; often afs_osi_Sleep() is invoked
from within another while() loop which is waiting for something
else.  I mentioned above a possible change to the afs_osi_Sleep()
function which would allow certain callers to indicate that they
are ready to handle an interruption, while most of the existing
code will remain uninterruptible.

> Nickolai reports that code was checked into CVS (which I have not yet seen) 
> that blocks signals in daemon syscalls.

FTR, this is in revision 1.10 of src/afs/afs_osi.c.  I've also
checked in the change to block signals in conventional syscalls,
in revision 1.11 of src/afs/LINUX/osi_sleep.c.

-- kolya