[OpenAFS-devel] Same problem in a different place.

Wed, 30 Jan 2002 10:14:40 -0500 (EST)

On Tue, 29 Jan 2002 17:14:25 -0700 Matt Peterson <matt@caldera.com> wrote:
> What I found is that there is still one more place we are seeing cpu
> hog.  I spent some time and tracked it all the way through a ton of
> code to rx_Read() and eventually to CV_WAIT() in rx/LINUX/rx_kmutex.h.
...
> As a short term solution I'd recommend that you put the same fix you put in 
> LINUX/osi_sleep.c into the CV_WAIT() function in LINUX/rx_kmutex.h.

Good catch.  It seems clear that Kolya's patch needs to be (logically)
applied in both places.  Also CV_TIMEDWAIT would need the patch too, but
no one uses it.

> Actually, since afs_osi_Sleep() and CV_WAIT() are used so frequently, it 
> might be easier to make just make two new calls:
> 
>    int afs_osi_SleepHonorSig(char *event, int aintrok);
>    int CV_WAIT_HONORSIG(afs_kcondvar_t *cv, afs_kmutex_t *l, int aintrok)
> 
> This way you could #define them to the existing (non-signal calls) for 
> non-linux platforms.  Slowly as time and review permits you can replace the 
> old calls with the "HonorSig" calls in places where it is possible to return 
> the appropriate error code to userland.

This seems like a good suggestion.

On Tue, 29 Jan 2002 10:53:02 -0700 Matt Peterson <matt@caldera.com> wrote:
> Nickolai Zeldovich <kolya@MIT.EDU> wrote:
> > Unfortunately, this has the side effect of artificially increasing the 
> > system load to ~4 when AFS is idle, because the background daemons 
> > block waiting for requests.
> 
> Is Nickolai is right on this one?  Tasks with a state of
> TASK_INTERRUPTIBLE do not appear to increase the system load.  If they
> did, you'd already see an increase of 4 as the daemon syscalls
> currently spend most of their time in interruptable_sleep_on() (even
> without Nickolai's patch).  Basically, I don't see how Nickolai's
> patch would change system load.

Basically, the reason is that some AFS daemons spend all their idle time
waiting for work on a queue or similar structure.  When a task is
delivered, the daemon is notified by an osi_Wakeup().  If they were
sleeping uninterruptibly the system would count those threads against
the load average.  So to prevent that, the Linux port waits
interuuptibly, even though there isn' any really intent to allow
interrupts (this was changed by a delta applied to AFS3.5 in 3/1999).
As you've noticed there are some cases where cleanly breaking out of a
sleep loop would be easy enough, but others where a lot of
reorganization would be required.

> Ultimately, the fact remains that there is absolutely NO WAY to get
> out of afs_osi_Sleep() via a signal -- even with Nickolai's patch and
> regardless of whether or not the caller of afs_osi_Sleep() has the
> capability of letting the signal be handled by a userland caller.  The
> reason for this is that afs_osi_Sleep() (because of the (while (seq ==
> evp->seq) ) will loop endlessly until _some_other_task_ calls
> afs_osi_Wake().

Right.  The signal mechanism is really only for user space processes,
and, in a few specific cases, to tell the kernel that the user is
getting impatient.  Internally, AFS proper uses sleep and wakeup calls
applied to a channel (which is just an arbitrary number, typically a
pointer cast to an int) while Rx uses explicit condition variables.  But
in neither case are signals involved.

Ideally, signals should allow long-running AFS system calls, such as
read and write, to abort early.  But there is a lot of untangling
necessary to make this work safely.  In the case of daemons, signals
should just be ignored all the time.

> If my application which is write()ing a large block of data ever gets stuck 
> in afs_osi_Sleep() then there is _absolutely_no_way_ that a signal will ever 
> get me out.  I have to wait until something else interrupts the sleep.  How 
> long do I have to wait?

Perhaps it is a little strange, but AFS has always been a real bear when
it gets stuck on a down server.  Hitting ^C does no good at all; you
just have to wait for the client to decide the server is down.

> From what I've seen of the code (again I've only looked at it for a
> few hours), I get the feeling that in order to make the code signal
> safe, we will have to remove the while(seq == evp->seq) loop from
> afs_osi_Sleep().

All the calls to afs_osi_Sleep are wrapped with a while loop which
checks that the condition the caller is waiting for has really occurred.
So, it should be harmless to have sleep return early as long as it
doesn't do it too often.  In any case, it might be simpler, as you
suggested later, to just have a different sleep function that allows
interrupts.

Hope this helps,
Ted Anderson