[OpenAFS-devel] dealing with rxevent queue stalls

Simon Wilkinson simonxwilkinson@gmail.com
Tue, 24 Sep 2013 16:52:05 +0100


On 23 Sep 2013, at 20:52, Mark Vitale <mvitale@sinenomine.net> wrote:

> Recently I've been working on several problems with very different =
externals but a similar root cause:
[ ... ]
> So in two very different situations, the rxevent queue was unable to =
process scheduled events in a timely manner, leading to very strange and =
difficult-to-diagnose symptoms.

Hi Mark,

So, it's worth noting that lots of the code involved here is very =
different between 1.6 and master. I assume that you are seeing these =
issues on 1.6.

I rewrote the rxevent queue completely for master as part of YFS's RX =
performance work. A side-effect of the new implementation is that it =
should be less vulnerable to timer stalls - as soon as it is triggered, =
all of the expired events will be run, rather than just a subset. Master =
is also moving towards pthreaded ubik servers, so the need to work =
around thread starvation (whether through problems with IOMGR, or the =
use of non-IOMGR I/O) will be reduced.

The challenge with rxevent is that, along with the listener thread, it =
is performance critical for the OpenAFS RX stack. If we do add =
additional code to handle edge cases, we need to be sure that the impact =
of that additional code on the common case is negligible. The more =
uncommon the situation we're trying to handle, the smaller the impact on =
the common case needs to be. In particular anything that adds additional =
locking to the rxevent critical path needs to be very carefully handled.

My feeling is that it will be hard to justify adding the code you =
suggest to master, where it seems that the only potential trigger is =
hardware failure. There's a better case to make for 1.6, although as a =
"stable" release, I think we'd probably be looking for something of =
minimal impact - logging, rather than aborting seems like the safest bet =
there.

Finally, it's also worth considering that some platforms have very vague =
ideas of what "timely" means when it comes to rxevent - some kernels =
only schedule the event thread every half second - on a link with a low =
RTT, you can end up with many timeout events going past before the event =
thread actually notices!

Cheers,

Simon