[OpenAFS-devel] dealing with rxevent queue stalls

Jeffrey Altman jaltman@your-file-system.com
Tue, 24 Sep 2013 08:54:14 -0400


This is a cryptographically signed message in MIME format.

--------------ms070609000406090209010009
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 9/23/2013 3:52 PM, Mark Vitale wrote:
> Recently I've been working on several problems with very different exte=
rnals but a similar root cause:
>=20
> 1) While accessing a particular fileserver, AFS clients experience perf=
ormance delays; some also see multiple "server down/back up" problems.
>   - root cause was a hardware bug on the fileserver that prevented time=
rs from firing reliably; this unpredictably delayed any task in the rxeve=
nt queue, while leaving the rest of the fileserver function relatively un=
affected.  (btw, this was a pthreaded fileserver).

I sympathize.  Tracking this down must have been quite frustrating.  One
of the basic assumptions of the rx event implementation is that timers
are reliable and that the OS is trusted to managed them.  If this
assumption is not true on a given system then rx_event will appear to
misbehave.

> 2) Volume releases suffer from poor performance and occasionally fail w=
ith timeouts.
>   - root cause was heavier-than-normal vlserver load (perhaps caused by=
 disk performance slowdowns); this starved LWP IOMGR, which in turn preve=
nted LWP rx_Listener from being dispatched (priority inversion), leading =
to a grossly delayed rxevent queue.

Did you intend to write "rx_Listener" in the above sentence or did you
mean the rx_Event thread?  If the listener thread never runs then
incoming packets are not processed in a timely manner and although it
will appear as if events are taking a long time to process it is more
likely the case that they were not posted in a timely fashion.

If the listener thread is running and the event thread is not then the
length of the event queues become very long and the cost of processing
an event post increases on 1.6 to the point where it slows down the
listener thread.

> So in two very different situations, the rxevent queue was unable to pr=
ocess scheduled events in a timely manner, leading to very strange and di=
fficult-to-diagnose symptoms.
>=20
> I'm writing this note to begin a discussion on possible ways to address=
 this in OpenAFS.
>=20
> One possible approach is to implement some watchdog/sentinel code to de=
tect when the rxevent queue is not working correctly; that is, when it's =
unable to run scheduled events in a timely manner.   Certainly rxevent ca=
n't watch itself; but rather than adding another thread as a watchdog, I =
chose to insert a sanity check into rxevent_Post().  This check essential=
ly compares the current time (if supplied on the "now" parameter) with th=
e scheduled time for the top rxevent on the queue.  If it's later than a =
certain threshold, then we know that the rxevent queue has fallen behind =
(is "sick") for some unknown reason.  At this point, I set a state flag w=
hich causes any new connections to abort (with timeout or busy, for examp=
le).  Both the threshold and reply could be configurable, similar to the =
current implementation of the -busyat thread-busy threshold and response.=
  After the rxevent queue is able to catch up with its scheduling work, t=
he "sick" state is reset.  And lastly, warning m
essages could be written to the log to indicate that the rxevent queue is=
 having difficulties and later has returned to normal.  I have some proto=
type code working in my test environment; it needs some work before it wi=
ll be suitable for review.  =20
>=20
> Another possible approach is, instead of sending a abort codes when we =
are "sick", merely suspend RPC operations completely; that is, don't send=
 any packets or process any calls until we aren't "sick" again.   That wo=
uld mean that the server process appears to "freeze" entirely whenever th=
e event thread gets stuck.  Certainly this would be an immediate alert th=
at something is wrong, rather than the hit-or-miss mystery behavior when =
merely the rxevent queue is not being dispatched.  =20
>=20
> But all that is moot if the upstream development community finds these =
approaches misguided.  One could argue in the case of the first failure, =
that it's unreasonable to expect OpenAFS to work predictably on buggy har=
dware.  One could also discount the second failure on the grounds that it=
's just an LWP bug, and that priority inversion is not possible in pthrea=
ded-ubik, which will be here real soon now.   However, in my opinion thes=
e are both just two instances of an underlying weak spot in OpenAFS; ther=
e may be other ways that the rxevent queue could become "sick".   Therefo=
re, I believe the rxevent queue could use some bulletproofing.


In rx, the listener and event processing are time critical.  If you
compare 1.6 to master you will see that Simon has redesigned event
management in order to cut down on the number of comparisons required in
that path.  Adding new checks or writing to a log file in the event post
path should be avoided.  Increasing the complexity of the code to work
around an OS or hardware failure that violates basic assumptions is a
bad idea.  If this was expected behavior of the OS or hardware, then we
might need to consider whether that OS or hardware should even be support=
ed.

For the LWP issue.  It is unclear from your description what the root
cause of the problem is.  You have not described the resulting packet
flows on the wire and how those flows are being interpreted or
misinterpreted by the peer.

As a more general comment, I am concerned about repercussions of the
proposed behavior changes on the overall ecosystem.  When you say
"merely suspect RPC operations completely" I assume that you have a file
server or volume location server in mind but they aren't the only rx
servers.  I don't think that replacing one broken behavior with a new
broken behavior (aborts or refusal to respond) is a good idea.  Doing so
increases complexity and has the potential for unintended side effects.

Feel free to respond with more specific details of the problem.

Jeffrey Altman



--------------ms070609000406090209010009
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIINITCC
BkIwggUqoAMCAQICEDirAC//rpa3Vv85Wvtd5xswDQYJKoZIhvcNAQEFBQAwgcoxCzAJBgNV
BAYTAlVTMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1
c3QgTmV0d29yazE6MDgGA1UECxMxKGMpIDE5OTkgVmVyaVNpZ24sIEluYy4gLSBGb3IgYXV0
aG9yaXplZCB1c2Ugb25seTFFMEMGA1UEAxM8VmVyaVNpZ24gQ2xhc3MgMSBQdWJsaWMgUHJp
bWFyeSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eSAtIEczMB4XDTExMDkwMTAwMDAwMFoXDTIx
MDgzMTIzNTk1OVowgaYxCzAJBgNVBAYTAlVTMR0wGwYDVQQKExRTeW1hbnRlYyBDb3Jwb3Jh
dGlvbjEfMB0GA1UECxMWU3ltYW50ZWMgVHJ1c3QgTmV0d29yazEeMBwGA1UECxMVUGVyc29u
YSBOb3QgVmFsaWRhdGVkMTcwNQYDVQQDEy5TeW1hbnRlYyBDbGFzcyAxIEluZGl2aWR1YWwg
U3Vic2NyaWJlciBDQSAtIEc0MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxuwn
/R1j9DsdisHTHMjIgoa2uEqGkqqBXHLKMA0vnkEiVzAhJZCao/SsKsaIF4ZhchN2LuwDyyeb
jyCAN+DkitpVplAP/LlcI2mJQqG6H6/vDvmkyQrx+DeyxtmSSq5937hEH5u6P4wG/tgjT0hR
I2pghKjuJy9g35byGiqMPI8AzE/L+iCOvDX24fCatgXz/B0/xhR7DtryBeTTgwKmxWlwtKnk
VunbHVz0pjbia7UeKi3cvrvuOgSwMAitX2hsxr0GloiE5+apZC28ODC7iCbDZ2ZmtLR3+cCh
xw5y72bi5bnK4POFdzWY3tQcsP5mceI4y258T0BV65fZqBge7QIDAQABo4ICRDCCAkAwOAYI
KwYBBQUHAQEELDAqMCgGCCsGAQUFBzABhhxodHRwOi8vcGtpLW9jc3AudmVyaXNpZ24uY29t
MBIGA1UdEwEB/wQIMAYBAf8CAQAwbAYDVR0gBGUwYzBhBgtghkgBhvhFAQcXATBSMCYGCCsG
AQUFBwIBFhpodHRwOi8vd3d3LnN5bWF1dGguY29tL2NwczAoBggrBgEFBQcCAjAcGhpodHRw
Oi8vd3d3LnN5bWF1dGguY29tL3JwYTA0BgNVHR8ELTArMCmgJ6AlhiNodHRwOi8vY3JsLnZl
cmlzaWduLmNvbS9wY2ExLWczLmNybDAOBgNVHQ8BAf8EBAMCAQYwKQYDVR0RBCIwIKQeMBwx
GjAYBgNVBAMTEVZlcmlTaWduTVBLSS0yLTk3MB0GA1UdDgQWBBSt+cOTci21uShh5KTXYNXE
Cl4aATCB8QYDVR0jBIHpMIHmoYHQpIHNMIHKMQswCQYDVQQGEwJVUzEXMBUGA1UEChMOVmVy
aVNpZ24sIEluYy4xHzAdBgNVBAsTFlZlcmlTaWduIFRydXN0IE5ldHdvcmsxOjA4BgNVBAsT
MShjKSAxOTk5IFZlcmlTaWduLCBJbmMuIC0gRm9yIGF1dGhvcml6ZWQgdXNlIG9ubHkxRTBD
BgNVBAMTPFZlcmlTaWduIENsYXNzIDEgUHVibGljIFByaW1hcnkgQ2VydGlmaWNhdGlvbiBB
dXRob3JpdHkgLSBHM4IRAItbdVaEVIULAM+vOEjOsaQwDQYJKoZIhvcNAQEFBQADggEBANaP
wdqbiPKzbE0fWC+6AVFddMFG6MO4e5/WQPHv/zK6iWvADjRDn6SZ5qTwXUgzYoWFYf4jiCKM
YJsrnGVJlMSiOCRIpVylUEto6WIip5PomSJuPVu7EEIOH0x1RzRWCY/4vYw881y70pZwVHBi
Te/REL6dSCxe7IZrB4LwPeElJygs4BZ2HrP95WKW0oo9Xyuu+1zCE7dlY8s0dkOf1oeZq26t
lcEAP0Yngf813iMOQ9wUXzL5yinvwlIw9ZnduYH4OiUgjYJo8rkhhXRmBOGGORYy8i3WKqjJ
3tkAAk/jGCDFpYFWtpXe04Kt+HslvmR8LqC6cCz4+XXidE0HbYQwggbXMIIFv6ADAgECAhAl
sq27FC4B63Z8q1Jzxx+CMA0GCSqGSIb3DQEBBQUAMIGmMQswCQYDVQQGEwJVUzEdMBsGA1UE
ChMUU3ltYW50ZWMgQ29ycG9yYXRpb24xHzAdBgNVBAsTFlN5bWFudGVjIFRydXN0IE5ldHdv
cmsxHjAcBgNVBAsTFVBlcnNvbmEgTm90IFZhbGlkYXRlZDE3MDUGA1UEAxMuU3ltYW50ZWMg
Q2xhc3MgMSBJbmRpdmlkdWFsIFN1YnNjcmliZXIgQ0EgLSBHNDAeFw0xMzAxMTUwMDAwMDBa
Fw0xNDAxMTYyMzU5NTlaMIHOMS4wLAYDVQQDDCVQZXJzb25hIE5vdCBWYWxpZGF0ZWQgLSAx
MzU4Mjc2MTA4NjMxMSswKQYJKoZIhvcNAQkBFhxqYWx0bWFuQHlvdXItZmlsZS1zeXN0ZW0u
Y29tMQ8wDQYDVQQLDAZTL01JTUUxHjAcBgNVBAsMFVBlcnNvbmEgTm90IFZhbGlkYXRlZDEf
MB0GA1UECwwWU3ltYW50ZWMgVHJ1c3QgTmV0d29yazEdMBsGA1UECgwUU3ltYW50ZWMgQ29y
cG9yYXRpb24wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQChjpVdVk6XeKFff4To
xGglVcP3FVY95LfwfBUKbQKppRYgfKBFW1RIEG+KXpc3IGOOTUX0Lg1xaukRwuoqv/pqRMNK
JabXcr1RFuCFg9xfcmP5Z+65qQ+IRxYCKdcNfu3GdS7rMOY57+VLU7aZnnMgnt0oE42awze8
gYQSJsdjZKKZS9DBnzxJYfzqhIw7txoMdV7rwPAZm5yujsqI/eWuZPZ+qZ+8GTsnHJzZNvc6
KrDGU6aVfknM+qf92hXA2VXvmtf1B17BBbGX6Hgat71Ufw5oXFly6/Vt+e5mKIozXytw8qPX
NllsG89ITVzn0OovcpcSRFBrXanzVn7JVj33AgMBAAGjggLVMIIC0TAMBgNVHRMBAf8EAjAA
MA4GA1UdDwEB/wQEAwIFoDAgBgNVHSUBAf8EFjAUBggrBgEFBQcDBAYIKwYBBQUHAwIwHQYD
VR0OBBYEFMC1SMuRefXyNAwkZM+Lgu7iJ6nFMCcGA1UdEQQgMB6BHGphbHRtYW5AeW91ci1m
aWxlLXN5c3RlbS5jb20wHwYDVR0jBBgwFoAUrfnDk3IttbkoYeSk12DVxApeGgEwggErBggr
BgEFBQcBAQSCAR0wggEZMIIBFQYIKwYBBQUHMAKGggEHbGRhcDovL2RpcmVjdG9yeS52ZXJp
c2lnbi5jb20vQ04lMjAlM0QlMjBTeW1hbnRlYyUyMENsYXNzJTIwMSUyMEluZGl2aWR1YWwl
MjBTdWJzY3JpYmVyJTIwQ0ElMjAtJTIwRzQlMkMlMjBPVSUyMCUzRCUyMFBlcnNvbmElMjBO
b3QlMjBWYWxpZGF0ZWQlMkMlMjBPVSUyMCUzRCUyMFN5bWFudGVjJTIwVHJ1c3QlMjBOZXR3
b3JrJTJDJTIwTyUyMCUzRCUyMFN5bWFudGVjJTIwQ29ycG9yYXRpb24lMkMlMjBDJTIwJTNE
JTIwVVM/Y0FDZXJ0aWZpY2F0ZTtiaW5hcnkwXQYDVR0fBFYwVDBSoFCgToZMaHR0cDovL3Br
aS1jcmwuc3ltYXV0aC5jb20vY2FfNTYxYzEwMzY5MGM5N2E2OTI0N2EwZWYwNzFhYzgxYWYv
TGF0ZXN0Q1JMLmNybDBsBgNVHSAEZTBjMGEGC2CGSAGG+EUBBxcBMFIwJgYIKwYBBQUHAgEW
Gmh0dHA6Ly93d3cuc3ltYXV0aC5jb20vY3BzMCgGCCsGAQUFBwICMBwaGmh0dHA6Ly93d3cu
c3ltYXV0aC5jb20vcnBhMCoGCmCGSAGG+EUBEAMEHDAaBhFghkgBhvhFARABAgIEAYazFxYF
MTA5MjIwDQYJKoZIhvcNAQEFBQADggEBAHVBsY+l21fY0twxzNnO0QbadbRT4n7k3rYfOMbP
noBDkYQx5lcrYn19f7e+ADMPdW/MY2ZjFM76aiRgiTo2IjMB7z4vyX6hfGJKTIHX9loDW0H2
z2o0jYqXDQtXx9A/gfMtVh9J+8O5IuCPsVtXgI4p6kRQHN+Er2Rzu1I4BILxE9HsDb/ruX6p
NXZSUQ2AcMP87ZVfz+reumMpJgWWAoiQWJCgp+qZ2c2AG+yV+FstiMpxIj/qB9+BrFRuam8d
IGbH0tIS2tyc7pgit8Pid2Zv7HGT2NFu69INsKIyqGImhMVuOUaCq/kNi3Z+6R5Hm3ljift8
ZF52Kiq4whe8KlcxggRSMIIETgIBATCBuzCBpjELMAkGA1UEBhMCVVMxHTAbBgNVBAoTFFN5
bWFudGVjIENvcnBvcmF0aW9uMR8wHQYDVQQLExZTeW1hbnRlYyBUcnVzdCBOZXR3b3JrMR4w
HAYDVQQLExVQZXJzb25hIE5vdCBWYWxpZGF0ZWQxNzA1BgNVBAMTLlN5bWFudGVjIENsYXNz
IDEgSW5kaXZpZHVhbCBTdWJzY3JpYmVyIENBIC0gRzQCECWyrbsULgHrdnyrUnPHH4IwCQYF
Kw4DAhoFAKCCAmswGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcN
MTMwOTI0MTI1NDE0WjAjBgkqhkiG9w0BCQQxFgQUF3OTR+BoPn2pJklP+iFwNxzxJXswbAYJ
KoZIhvcNAQkPMV8wXTALBglghkgBZQMEASowCwYJYIZIAWUDBAECMAoGCCqGSIb3DQMHMA4G
CCqGSIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG9w0DAgIBKDCB
zAYJKwYBBAGCNxAEMYG+MIG7MIGmMQswCQYDVQQGEwJVUzEdMBsGA1UEChMUU3ltYW50ZWMg
Q29ycG9yYXRpb24xHzAdBgNVBAsTFlN5bWFudGVjIFRydXN0IE5ldHdvcmsxHjAcBgNVBAsT
FVBlcnNvbmEgTm90IFZhbGlkYXRlZDE3MDUGA1UEAxMuU3ltYW50ZWMgQ2xhc3MgMSBJbmRp
dmlkdWFsIFN1YnNjcmliZXIgQ0EgLSBHNAIQJbKtuxQuAet2fKtSc8cfgjCBzgYLKoZIhvcN
AQkQAgsxgb6ggbswgaYxCzAJBgNVBAYTAlVTMR0wGwYDVQQKExRTeW1hbnRlYyBDb3Jwb3Jh
dGlvbjEfMB0GA1UECxMWU3ltYW50ZWMgVHJ1c3QgTmV0d29yazEeMBwGA1UECxMVUGVyc29u
YSBOb3QgVmFsaWRhdGVkMTcwNQYDVQQDEy5TeW1hbnRlYyBDbGFzcyAxIEluZGl2aWR1YWwg
U3Vic2NyaWJlciBDQSAtIEc0AhAlsq27FC4B63Z8q1Jzxx+CMA0GCSqGSIb3DQEBAQUABIIB
AEfKNNXIxd7u+6umncl4ifiSajBdlGPwl6GahR6iXPJ2shR6oWQk4YNgtdQFGNxySNd2Z3oG
2zVr7CvFnoDjoo9aX80IL3rT1dQN43AtkA1+Zwj+mAAwU3i3xq9c0+wUCaxmCKQbX1MDYi5V
qfvS7wVpujTYc4i2BuzIoK/90pHrBEurD8qEkuiQgmLjpPd9tvlt0SyUOsPLTjNWN2F4hkT3
um0pAkz26+a9lNkGGuqs/c+apwJ5qMPJ/37F+L70nXGoKDGrZ3kl4eOutivH4SbGfANxY26H
nvkiKG24MU8oaCdBkHG20of1uqM7YNCT0dcHCSm9p4LvA+g7JbuteqsAAAAAAAA=
--------------ms070609000406090209010009--