[OpenAFS-devel] dealing with rxevent queue stalls

Mark Vitale mvitale@sinenomine.net
Thu, 26 Sep 2013 18:31:09 +0000


--Apple-Mail=_FAA1B4DF-F2BE-49AA-B4A8-D2F40DC7E698
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii


On Sep 24, 2013, at 11:52 AM, Simon Wilkinson =
<simonxwilkinson@gmail.com> wrote:

> So, it's worth noting that lots of the code involved here is very =
different between 1.6 and master. I assume that you are seeing these =
issues on 1.6.
Yes, "essentially" 1.6.

> I rewrote the rxevent queue completely for master as part of YFS's RX =
performance work. A side-effect of the new implementation is that it =
should be less vulnerable to timer stalls - as soon as it is triggered, =
all of the expired events will be run, rather than just a subset. Master =
is also moving towards pthreaded ubik servers, so the need to work =
around thread starvation (whether through problems with IOMGR, or the =
use of non-IOMGR I/O) will be reduced.=20
>=20
> The challenge with rxevent is that, along with the listener thread, it =
is performance critical for the OpenAFS RX stack. If we do add =
additional code to handle edge cases, we need to be sure that the impact =
of that additional code on the common case is negligible. The more =
uncommon the situation we're trying to handle, the smaller the impact on =
the common case needs to be. In particular anything that adds additional =
locking to the rxevent critical path needs to be very carefully handled.
In my prototype, logging is done with dpf() from rxevent_Post(), and =
only when the sick/well state changes (as a one-shot).=20

> My feeling is that it will be hard to justify adding the code you =
suggest to master, where it seems that the only potential trigger is =
hardware failure. There's a better case to make for 1.6, although as a =
"stable" release, I think we'd probably be looking for something of =
minimal impact - logging, rather than aborting seems like the safest bet =
there.
Okay.

> Finally, it's also worth considering that some platforms have very =
vague ideas of what "timely" means when it comes to rxevent - some =
kernels only schedule the event thread every half second - on a link =
with a low RTT, you can end up with many timeout events going past =
before the event thread actually notices!
In the prototype, "untimely" is when (the top) scheduled events miss =
their run time by more than 5 seconds.=20
I realize that "untimely" from an Rx protocol point of view is several =
orders of magnitude more stringent than that, but I'm only trying to =
flag gross problems with this fix.

Thanks for your comments (and to everyone else that responded).  They =
were all very helpful.

The concensus so far seems to be that I should only attempt to address =
this in 1.6 (not master), and then only if I can find a low-impact way =
to log the warning messages.  Have I gauged that correctly?

Regards,
--
Mark Vitale
mvitale@sinenomine.net



--Apple-Mail=_FAA1B4DF-F2BE-49AA-B4A8-D2F40DC7E698
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="signature.asc"
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org

iQIcBAEBAgAGBQJSRH1sAAoJECQB9O5MipHIcgUP/3I/4WGBRPHRCvOAKOmb7KMy
Q0ET3/qfse1WzhEmqwTYq+SJKfFYr7SjvZh2sZE/buyGMBQa5Cu6N5Rltl7+rsCb
tfyehKXLKhrfxC/Jhfispr9ybnx/nonjLufi/mIvFA4nBaFV13uA7eW0yQ2RoysU
b4YEudX+7UWVZthgQbfRQ4gS830o1Yt1cZb7wh8YjcGHfMBOsbMK6RN1r3T6FmFb
u1WMdBnqYcUOfRlPnMC2R6WXb7M12C2/mRvttlFr+8+IwdAZIhTRtMFTvjJh8UAt
0zcr9pEfw9V7jTOn3eltb1t6SJB9UH3qpl1GXK+uQbFLn2OPZ13npTrQ3xXnwMOU
uwEegt1ABAlILGU6EAGzr3IuiPZbof/btFRDS1FiDe0jlcE9PL8jXdC/T+HDivSp
dV/byOa5cYLhsnXVkB39fEKnH/RR+/xJSmU7lC9+VCeAsenlUDn67T3xQI64A3Mn
XoF82/3MUxkPBM20jUB+l3GEee/yrIHdEGy1PTHSviiXUWcI3U+Ct12XP2WtvUFX
Bg7Do882lNvw0YWFDcHUTnJyeSIomboF3rPPqoG3VYzk2jv4+63UlpyZrV/juZS2
EiHsFfZHEF4DMIMKgpex98/kKEKHlUBLwcAVp7OZ7dsrfrCs6rT3lGbudImaooTj
f4AS6E27oMmSB4XhGzij
=+6CO
-----END PGP SIGNATURE-----

--Apple-Mail=_FAA1B4DF-F2BE-49AA-B4A8-D2F40DC7E698--