[OpenAFS-devel] dealing with rxevent queue stalls

Mark Vitale mvitale@sinenomine.net
Mon, 23 Sep 2013 19:52:37 +0000


--Apple-Mail=_5D3E00A7-7D8E-43B2-AA36-9E8106B4548B
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

Recently I've been working on several problems with very different =
externals but a similar root cause:

1) While accessing a particular fileserver, AFS clients experience =
performance delays; some also see multiple "server down/back up" =
problems.
  - root cause was a hardware bug on the fileserver that prevented =
timers from firing reliably; this unpredictably delayed any task in the =
rxevent queue, while leaving the rest of the fileserver function =
relatively unaffected.  (btw, this was a pthreaded fileserver).

2) Volume releases suffer from poor performance and occasionally fail =
with timeouts.
  - root cause was heavier-than-normal vlserver load (perhaps caused by =
disk performance slowdowns); this starved LWP IOMGR, which in turn =
prevented LWP rx_Listener from being dispatched (priority inversion), =
leading to a grossly delayed rxevent queue.

So in two very different situations, the rxevent queue was unable to =
process scheduled events in a timely manner, leading to very strange and =
difficult-to-diagnose symptoms.

I'm writing this note to begin a discussion on possible ways to address =
this in OpenAFS.

One possible approach is to implement some watchdog/sentinel code to =
detect when the rxevent queue is not working correctly; that is, when =
it's unable to run scheduled events in a timely manner.   Certainly =
rxevent can't watch itself; but rather than adding another thread as a =
watchdog, I chose to insert a sanity check into rxevent_Post().  This =
check essentially compares the current time (if supplied on the "now" =
parameter) with the scheduled time for the top rxevent on the queue.  If =
it's later than a certain threshold, then we know that the rxevent queue =
has fallen behind (is "sick") for some unknown reason.  At this point, I =
set a state flag which causes any new connections to abort (with timeout =
or busy, for example).  Both the threshold and reply could be =
configurable, similar to the current implementation of the -busyat =
thread-busy threshold and response.  After the rxevent queue is able to =
catch up with its scheduling work, the "sick" state is reset.  And =
lastly, warning messages could be written to the log to indicate that =
the rxevent queue is having difficulties and later has returned to =
normal.  I have some prototype code working in my test environment; it =
needs some work before it will be suitable for review.  =20

Another possible approach is, instead of sending a abort codes when we =
are "sick", merely suspend RPC operations completely; that is, don't =
send any packets or process any calls until we aren't "sick" again.   =
That would mean that the server process appears to "freeze" entirely =
whenever the event thread gets stuck.  Certainly this would be an =
immediate alert that something is wrong, rather than the hit-or-miss =
mystery behavior when merely the rxevent queue is not being dispatched.  =
=20

But all that is moot if the upstream development community finds these =
approaches misguided.  One could argue in the case of the first failure, =
that it's unreasonable to expect OpenAFS to work predictably on buggy =
hardware.  One could also discount the second failure on the grounds =
that it's just an LWP bug, and that priority inversion is not possible =
in pthreaded-ubik, which will be here real soon now.   However, in my =
opinion these are both just two instances of an underlying weak spot in =
OpenAFS; there may be other ways that the rxevent queue could become =
"sick".   Therefore, I believe the rxevent queue could use some =
bulletproofing.

I look forward to your comments.

regards,
--
Mark Vitale
mvitale@sinenomine.net

--Apple-Mail=_5D3E00A7-7D8E-43B2-AA36-9E8106B4548B
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="signature.asc"
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org

iQIcBAEBAgAGBQJSQJwEAAoJECQB9O5MipHIs9YP/1RO3iB27f5zDVKejYJp8T0b
Y3N5C0wg3JnL6EatuHKzj3z/GAY8SejV+aSR4qqFp52++CI/y/V4B8+x9f3cIEVN
RAB8PXeyio5BxPHD+0GR4fxq1Cw80WA9ecTsBNJ8sGeAEb7B/iw/8EJT2hTeRuEI
qbv/ye/sRmK1ZuU9jtYCG0PyxZ922eBbX2VFWjFMu/QJAeXt6ZEO9LT+ToboJW5r
wIKQBh5/rIQXbdcRMC0B1PtfHC+02MMBeRuf+IXBPP5xNG9V9x5BwlB4lF0VOgoN
t4E6xpApmeEGgybHfiOjutG7+zI4Kc9XyPxFw4qbuut7NKn5dciobo8XMtnqJ1M6
WyW0wRFtPusM9g7T2jN4RIbHVjZKFajj0Od4CbgR25+wc1+7zO1vTrH/mdSr44FE
IDLrgR9et59TCfszsAjYJ0luUM3xL79OwaP/wFcCC3NJRSDbPFO+3xwWhT61yR1G
MoTKBKWOjQPTLdcnOTdP8TDosXA0T+rtqqfMQCaa65LNMq7FA5QRHvdWwG5jvUW7
1dml3w/aVz6/n9lDAC5YtgZnCf/Ha2WOXIvDnSsfkkUzUAJuZbOmvDbROJFmeMWP
3ARRlalsFCEV/9K1Ss/L1NJEHFXPSu3zK7An3CGrPOj6eDvxGVNHtL6z7al8zUkl
s0cBgT0iJwoTfbOUvOlk
=kcE0
-----END PGP SIGNATURE-----

--Apple-Mail=_5D3E00A7-7D8E-43B2-AA36-9E8106B4548B--