[OpenAFS] Re: Investigating 'calls waiting' from rxdebug

Jeffrey Altman jaltman@your-file-system.com
Fri, 16 Aug 2013 23:06:52 -0400


This is a cryptographically signed message in MIME format.

--------------ms010004040904070607010801
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Garance,

The important thing to note is that the reason that there are RPCs
waiting for threads is that every server thread that can process an RPC
is already tied up doing something.  Those threads could be:

 1 waiting for a reply from the PTDB

 2 waiting for a reply from a TellMeAboutYourself sent to
   a client

 3 waiting for a reply from a ProbeUUID sent to a client

 4 waiting for a reply from a InitCallBackState3 sent to
   a client

 5 waiting for a reply from a Callback sent to a client

 6 waiting for a lock that is held by another thread

 7 waiting for disk I/O to complete

For 1 through 5 there will be a rx connection from the server to the
client for each blocking RPC.  For an RPC issued to a client in response
to an in-bound RPC you should find a pair of entries in the rxdebug
output with the same address and port number but one being a client
connection and the other a server connection.  For example:

Connection from host X.Y.Z.102, port 34935, Cuid 96817158/47fe3724
  serial 3,  natMTU 1444, security index 0, client conn
    call 0: # 1, state dally, mode: receiving
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Connection from host X.Y.Z.102, port 34935, Cuid b35be5a7/48276030
  serial 3,  natMTU 1444, security index 0, server conn
    call 0: # 1, state active, mode: receiving
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized

In the case of a client that is switching NAT ports, you may see
connections have the same address but a different port number in each
direction.  In such a case the outbound connection to the old port
number for  the ProbeUUID RPC is going to block for the timeout period.
However, such a thread would not be holding a lock required by other
threads, it would just be keeping that thread busy until a timeout
generated or a response is received.

The worse case is where the thread is holding a lock on a resource such
as a vnode while callbacks are being issued to one or more clients that
are not responding.  This is typically a data modifying operation but
could also be a lock state change.   In such a case the thread holding
the vnode lock cannot safely return until after all outstanding
callbacks have been issued or scheduled for a delayed break.  While a
vnode lock is held no other threads that require that vnode can proceed.
 If there are a large number of clients trying to modify the same
directory, the contention ties up a large number of threads because only
one can make progress at a time and each RPC requires a callback to be
issued.  If there are clients that are unresponsive, there are delays.

Unfortunately, the logging in the file server is not all that helpful to
identify which vnode are involved because none of them include the FID
of the object a callback might be issued for.  A patch to log the FID
when a callback fails on master would be appreciated.

The behaviors of clients and servers have changed over the years.  It
really does matter which version the file server is and what the command
line configuration is.   Your file servers are behind a firewall and I
can't query them but your DB servers are 1.4.4 which is really out of
date.   There have been numerous fixes to the host package that tracks
the clients between 1.4.4 and 1.4.14 let alone between 1.4 and 1.6.
Not to mention many improvements in RX processing and security
vulnerabilities.   You might want to consider an upgrade.

Jeffrey Altman




--------------ms010004040904070607010801
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIINITCC
BkIwggUqoAMCAQICEDirAC//rpa3Vv85Wvtd5xswDQYJKoZIhvcNAQEFBQAwgcoxCzAJBgNV
BAYTAlVTMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1
c3QgTmV0d29yazE6MDgGA1UECxMxKGMpIDE5OTkgVmVyaVNpZ24sIEluYy4gLSBGb3IgYXV0
aG9yaXplZCB1c2Ugb25seTFFMEMGA1UEAxM8VmVyaVNpZ24gQ2xhc3MgMSBQdWJsaWMgUHJp
bWFyeSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eSAtIEczMB4XDTExMDkwMTAwMDAwMFoXDTIx
MDgzMTIzNTk1OVowgaYxCzAJBgNVBAYTAlVTMR0wGwYDVQQKExRTeW1hbnRlYyBDb3Jwb3Jh
dGlvbjEfMB0GA1UECxMWU3ltYW50ZWMgVHJ1c3QgTmV0d29yazEeMBwGA1UECxMVUGVyc29u
YSBOb3QgVmFsaWRhdGVkMTcwNQYDVQQDEy5TeW1hbnRlYyBDbGFzcyAxIEluZGl2aWR1YWwg
U3Vic2NyaWJlciBDQSAtIEc0MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxuwn
/R1j9DsdisHTHMjIgoa2uEqGkqqBXHLKMA0vnkEiVzAhJZCao/SsKsaIF4ZhchN2LuwDyyeb
jyCAN+DkitpVplAP/LlcI2mJQqG6H6/vDvmkyQrx+DeyxtmSSq5937hEH5u6P4wG/tgjT0hR
I2pghKjuJy9g35byGiqMPI8AzE/L+iCOvDX24fCatgXz/B0/xhR7DtryBeTTgwKmxWlwtKnk
VunbHVz0pjbia7UeKi3cvrvuOgSwMAitX2hsxr0GloiE5+apZC28ODC7iCbDZ2ZmtLR3+cCh
xw5y72bi5bnK4POFdzWY3tQcsP5mceI4y258T0BV65fZqBge7QIDAQABo4ICRDCCAkAwOAYI
KwYBBQUHAQEELDAqMCgGCCsGAQUFBzABhhxodHRwOi8vcGtpLW9jc3AudmVyaXNpZ24uY29t
MBIGA1UdEwEB/wQIMAYBAf8CAQAwbAYDVR0gBGUwYzBhBgtghkgBhvhFAQcXATBSMCYGCCsG
AQUFBwIBFhpodHRwOi8vd3d3LnN5bWF1dGguY29tL2NwczAoBggrBgEFBQcCAjAcGhpodHRw
Oi8vd3d3LnN5bWF1dGguY29tL3JwYTA0BgNVHR8ELTArMCmgJ6AlhiNodHRwOi8vY3JsLnZl
cmlzaWduLmNvbS9wY2ExLWczLmNybDAOBgNVHQ8BAf8EBAMCAQYwKQYDVR0RBCIwIKQeMBwx
GjAYBgNVBAMTEVZlcmlTaWduTVBLSS0yLTk3MB0GA1UdDgQWBBSt+cOTci21uShh5KTXYNXE
Cl4aATCB8QYDVR0jBIHpMIHmoYHQpIHNMIHKMQswCQYDVQQGEwJVUzEXMBUGA1UEChMOVmVy
aVNpZ24sIEluYy4xHzAdBgNVBAsTFlZlcmlTaWduIFRydXN0IE5ldHdvcmsxOjA4BgNVBAsT
MShjKSAxOTk5IFZlcmlTaWduLCBJbmMuIC0gRm9yIGF1dGhvcml6ZWQgdXNlIG9ubHkxRTBD
BgNVBAMTPFZlcmlTaWduIENsYXNzIDEgUHVibGljIFByaW1hcnkgQ2VydGlmaWNhdGlvbiBB
dXRob3JpdHkgLSBHM4IRAItbdVaEVIULAM+vOEjOsaQwDQYJKoZIhvcNAQEFBQADggEBANaP
wdqbiPKzbE0fWC+6AVFddMFG6MO4e5/WQPHv/zK6iWvADjRDn6SZ5qTwXUgzYoWFYf4jiCKM
YJsrnGVJlMSiOCRIpVylUEto6WIip5PomSJuPVu7EEIOH0x1RzRWCY/4vYw881y70pZwVHBi
Te/REL6dSCxe7IZrB4LwPeElJygs4BZ2HrP95WKW0oo9Xyuu+1zCE7dlY8s0dkOf1oeZq26t
lcEAP0Yngf813iMOQ9wUXzL5yinvwlIw9ZnduYH4OiUgjYJo8rkhhXRmBOGGORYy8i3WKqjJ
3tkAAk/jGCDFpYFWtpXe04Kt+HslvmR8LqC6cCz4+XXidE0HbYQwggbXMIIFv6ADAgECAhAl
sq27FC4B63Z8q1Jzxx+CMA0GCSqGSIb3DQEBBQUAMIGmMQswCQYDVQQGEwJVUzEdMBsGA1UE
ChMUU3ltYW50ZWMgQ29ycG9yYXRpb24xHzAdBgNVBAsTFlN5bWFudGVjIFRydXN0IE5ldHdv
cmsxHjAcBgNVBAsTFVBlcnNvbmEgTm90IFZhbGlkYXRlZDE3MDUGA1UEAxMuU3ltYW50ZWMg
Q2xhc3MgMSBJbmRpdmlkdWFsIFN1YnNjcmliZXIgQ0EgLSBHNDAeFw0xMzAxMTUwMDAwMDBa
Fw0xNDAxMTYyMzU5NTlaMIHOMS4wLAYDVQQDDCVQZXJzb25hIE5vdCBWYWxpZGF0ZWQgLSAx
MzU4Mjc2MTA4NjMxMSswKQYJKoZIhvcNAQkBFhxqYWx0bWFuQHlvdXItZmlsZS1zeXN0ZW0u
Y29tMQ8wDQYDVQQLDAZTL01JTUUxHjAcBgNVBAsMFVBlcnNvbmEgTm90IFZhbGlkYXRlZDEf
MB0GA1UECwwWU3ltYW50ZWMgVHJ1c3QgTmV0d29yazEdMBsGA1UECgwUU3ltYW50ZWMgQ29y
cG9yYXRpb24wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQChjpVdVk6XeKFff4To
xGglVcP3FVY95LfwfBUKbQKppRYgfKBFW1RIEG+KXpc3IGOOTUX0Lg1xaukRwuoqv/pqRMNK
JabXcr1RFuCFg9xfcmP5Z+65qQ+IRxYCKdcNfu3GdS7rMOY57+VLU7aZnnMgnt0oE42awze8
gYQSJsdjZKKZS9DBnzxJYfzqhIw7txoMdV7rwPAZm5yujsqI/eWuZPZ+qZ+8GTsnHJzZNvc6
KrDGU6aVfknM+qf92hXA2VXvmtf1B17BBbGX6Hgat71Ufw5oXFly6/Vt+e5mKIozXytw8qPX
NllsG89ITVzn0OovcpcSRFBrXanzVn7JVj33AgMBAAGjggLVMIIC0TAMBgNVHRMBAf8EAjAA
MA4GA1UdDwEB/wQEAwIFoDAgBgNVHSUBAf8EFjAUBggrBgEFBQcDBAYIKwYBBQUHAwIwHQYD
VR0OBBYEFMC1SMuRefXyNAwkZM+Lgu7iJ6nFMCcGA1UdEQQgMB6BHGphbHRtYW5AeW91ci1m
aWxlLXN5c3RlbS5jb20wHwYDVR0jBBgwFoAUrfnDk3IttbkoYeSk12DVxApeGgEwggErBggr
BgEFBQcBAQSCAR0wggEZMIIBFQYIKwYBBQUHMAKGggEHbGRhcDovL2RpcmVjdG9yeS52ZXJp
c2lnbi5jb20vQ04lMjAlM0QlMjBTeW1hbnRlYyUyMENsYXNzJTIwMSUyMEluZGl2aWR1YWwl
MjBTdWJzY3JpYmVyJTIwQ0ElMjAtJTIwRzQlMkMlMjBPVSUyMCUzRCUyMFBlcnNvbmElMjBO
b3QlMjBWYWxpZGF0ZWQlMkMlMjBPVSUyMCUzRCUyMFN5bWFudGVjJTIwVHJ1c3QlMjBOZXR3
b3JrJTJDJTIwTyUyMCUzRCUyMFN5bWFudGVjJTIwQ29ycG9yYXRpb24lMkMlMjBDJTIwJTNE
JTIwVVM/Y0FDZXJ0aWZpY2F0ZTtiaW5hcnkwXQYDVR0fBFYwVDBSoFCgToZMaHR0cDovL3Br
aS1jcmwuc3ltYXV0aC5jb20vY2FfNTYxYzEwMzY5MGM5N2E2OTI0N2EwZWYwNzFhYzgxYWYv
TGF0ZXN0Q1JMLmNybDBsBgNVHSAEZTBjMGEGC2CGSAGG+EUBBxcBMFIwJgYIKwYBBQUHAgEW
Gmh0dHA6Ly93d3cuc3ltYXV0aC5jb20vY3BzMCgGCCsGAQUFBwICMBwaGmh0dHA6Ly93d3cu
c3ltYXV0aC5jb20vcnBhMCoGCmCGSAGG+EUBEAMEHDAaBhFghkgBhvhFARABAgIEAYazFxYF
MTA5MjIwDQYJKoZIhvcNAQEFBQADggEBAHVBsY+l21fY0twxzNnO0QbadbRT4n7k3rYfOMbP
noBDkYQx5lcrYn19f7e+ADMPdW/MY2ZjFM76aiRgiTo2IjMB7z4vyX6hfGJKTIHX9loDW0H2
z2o0jYqXDQtXx9A/gfMtVh9J+8O5IuCPsVtXgI4p6kRQHN+Er2Rzu1I4BILxE9HsDb/ruX6p
NXZSUQ2AcMP87ZVfz+reumMpJgWWAoiQWJCgp+qZ2c2AG+yV+FstiMpxIj/qB9+BrFRuam8d
IGbH0tIS2tyc7pgit8Pid2Zv7HGT2NFu69INsKIyqGImhMVuOUaCq/kNi3Z+6R5Hm3ljift8
ZF52Kiq4whe8KlcxggRSMIIETgIBATCBuzCBpjELMAkGA1UEBhMCVVMxHTAbBgNVBAoTFFN5
bWFudGVjIENvcnBvcmF0aW9uMR8wHQYDVQQLExZTeW1hbnRlYyBUcnVzdCBOZXR3b3JrMR4w
HAYDVQQLExVQZXJzb25hIE5vdCBWYWxpZGF0ZWQxNzA1BgNVBAMTLlN5bWFudGVjIENsYXNz
IDEgSW5kaXZpZHVhbCBTdWJzY3JpYmVyIENBIC0gRzQCECWyrbsULgHrdnyrUnPHH4IwCQYF
Kw4DAhoFAKCCAmswGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcN
MTMwODE3MDMwNjUyWjAjBgkqhkiG9w0BCQQxFgQUxEj8RFrRrRbKapgCMnFdwU4W0S0wbAYJ
KoZIhvcNAQkPMV8wXTALBglghkgBZQMEASowCwYJYIZIAWUDBAECMAoGCCqGSIb3DQMHMA4G
CCqGSIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG9w0DAgIBKDCB
zAYJKwYBBAGCNxAEMYG+MIG7MIGmMQswCQYDVQQGEwJVUzEdMBsGA1UEChMUU3ltYW50ZWMg
Q29ycG9yYXRpb24xHzAdBgNVBAsTFlN5bWFudGVjIFRydXN0IE5ldHdvcmsxHjAcBgNVBAsT
FVBlcnNvbmEgTm90IFZhbGlkYXRlZDE3MDUGA1UEAxMuU3ltYW50ZWMgQ2xhc3MgMSBJbmRp
dmlkdWFsIFN1YnNjcmliZXIgQ0EgLSBHNAIQJbKtuxQuAet2fKtSc8cfgjCBzgYLKoZIhvcN
AQkQAgsxgb6ggbswgaYxCzAJBgNVBAYTAlVTMR0wGwYDVQQKExRTeW1hbnRlYyBDb3Jwb3Jh
dGlvbjEfMB0GA1UECxMWU3ltYW50ZWMgVHJ1c3QgTmV0d29yazEeMBwGA1UECxMVUGVyc29u
YSBOb3QgVmFsaWRhdGVkMTcwNQYDVQQDEy5TeW1hbnRlYyBDbGFzcyAxIEluZGl2aWR1YWwg
U3Vic2NyaWJlciBDQSAtIEc0AhAlsq27FC4B63Z8q1Jzxx+CMA0GCSqGSIb3DQEBAQUABIIB
AD+FuPxvFTrHWwzGxCMNQb45+nwrEu8VrUYx4GcwBllB5HI4a8M+zS57d6kqurvmYsWO+eUx
Mqge/wivnYRpQgftMd1X1uJFlEaNeWS17FEWNaITW+BjTBK0tuJE51wTjKsmYhYGaEyxi42M
FAaV1FxeVVk+D7FR2cFi8CFdgvUWf+zfHfKdxvcZHxeDFTclySWGD6YjAHzqVf3MUp+qeNlJ
4wehElrYqcEtrgsFH4et4lIIp2RQYGEnkUF8ApjRN4ClqXXN9vGG4FlqGh/l/2ZeMVARHtB1
Ai/rh9kbIEgbiotd6JKeGb6MGFbRghfp4TVWENnA77Yg4MdfqRBrZaMAAAAAAAA=
--------------ms010004040904070607010801--