[OpenAFS] BreakDelayedCallbacks FAILED still an issue
Jeffrey Altman
jaltman@secure-endpoints.com
Tue, 18 Apr 2006 13:53:40 -0400
This is a cryptographically signed message in MIME format.
--------------ms010306060407030106070801
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
The combination of problems that you have experienced should
be solved in 1.4.1. One of the issues that you were seeing is
that the client was contacting the server on a port other than
7001 and the server was attempting to break callbacks on port
7001. Since the NAT doesn't have a port mapping from 7001 to
the client, the callbacks could not be broken. Every time the
client would contact the server, the server would believe that
it had callbacks for the client that must be broken and would
block the incoming RPC until the callbacks could be broken.
The 1.3.77 client also has a serious bug that would cause it
to generate rapid fire requests using a new RX Connection for
each RPC. If you have 1.3.77 still deployed, try your best to
upgrade them.
The 1.4.1 file server (to be announced real soon now) goes to
great lengths to track clients by both address and port number
and to deal with clients behind NATs so that each time the NAT
allocates a new port number to the client the relevant host
entry will be updated to track it. This should provide a very
good NAT experience for end users that have AFS clients that
support UUIDs. All of the OpenAFS clients for UNIX/Linux support
UUIDs and Windows clients 1.3.80 and later do.
Jeffrey Altman
John W. Sopko Jr. wrote:
> We have 3 OpenAFS 1.4.0 files ervers running on Redhat linux
> enterprixe 3 with the latest patches. This morning when I
> came in the servers were very slow and not responding to
> client requests, they were basically hung. This in turn
> pretty much takes down all our web servers file services
> for home dirs etc.
>
> I tracked this down to a "bad" afs windows client, the client
> was running an old 1.3.77 version of the client or may have
> a mis configured firewall. I halted the "bad" client
> and this fixed our server problems. I turned up
> debugging on the file server (kill -TSTP) and got the below
> messages I used to track this down. I searched the afs-info
> archives and this problem was discussed in 2002 and was
> supposed to get fixed. Is this
> fixed in a version newer then 1.4.0? That is, not allowing
> clients to bring down the server with bad callbacks. Thanks
> for your input.
>
> Tue Apr 18 10:20:19 2006 CB: RCallBackConnectBack failed for
> 152.2.128.182:7001
> Tue Apr 18 10:22:27 2006 [12] CB: Call back connect back failed (in
> break delayed) for 152.2.128.182:7001
> Tue Apr 18 10:22:27 2006 [12] BreakDelayedCallbacks FAILED for host
> 152.2.128.182 which IS UP. Possible network or routing failure.
> Tue Apr 18 10:22:27 2006 [12] MultiProbe failed to find new address for
> host 152.2.128.182:7001
> Tue Apr 18 10:24:34 2006 [7] CB: WhoAreYou failed for
> 152.2.128.182:7001, error -03
> Tue Apr 18 10:26:42 2006 [7] CB: Call back connect back failed (in break
> delayed) for 152.2.128.182:7001
> Tue Apr 18 10:26:42 2006 [7] BreakDelayedCallbacks FAILED for host
> 152.2.128.182 which IS UP. Possible network or routing failure.
>
> Here is the old post about this:
>
> --------------------------------------------
> From fbo2@gmx.net Tue Aug 27 12:13:13 2002
> Date: Tue, 27 Aug 2002 18:12:59 +0200
> From: FBO <fbo2@gmx.net>
> To: OpenAFS-info@openafs.org
>
> 432936,1 22%
> X-BeenThere: openafs-info@openafs.org
> X-Mailman-Version: 2.0.4
> Precedence: bulk
> List-Help: <mailto:openafs-info-request@openafs.org?subject=help>
> List-Post: <mailto:openafs-info@openafs.org>
> List-Subscribe: <https://lists.openafs.org/mailman/listinfo/openafs-info>,
> <mailto:openafs-info-request@openafs.org?subject=subscribe>
> List-Id: OpenAFS Info/Discussion <openafs-info.openafs.org>
> List-Unsubscribe:
> <https://lists.openafs.org/mailman/listinfo/openafs-info>,
> <mailto:openafs-info-request@openafs.org?subject=unsubscribe>
> List-Archive: <https://lists.openafs.org/pipermail/openafs-info/>
>
> Hello,
>
> We (Solaris 8, Transarc 3.6 2.32 servers, 3.6 2.26 db servers) had an
> issue where a client with a certain firewall (Zone Alarm and or Black
> Ice) configuration (allowing AFS traffic out but no AFS traffic in, or
> more precisely, it didn't allow any _uninitiated_ inbound AFS traffic
> e.g. a fileserver callback) caused the fileserver (a couple actually) to
> come to a crawl (reads/writes taking 10minutes or more to complete) and
> become virtually unusable. Had to end up blocking this firewall'ed
> client machine to get fileservers back to normal. During "outage"
> FileLog would repeat following message sequence every minute:
>
> Wed Jul 10 16:22:55 2002 BreakDelayedCallbacks FAILED for host 894f2528
> which IS UP. Possible network or routing failure.
> Wed Jul 10 16:22:55 2002 MultiProbe failed to find new address for
> host894f2528.7001
> Wed Jul 10 16:23:51 2002 CB: Call back connect back failed (in break
> delayed) for 894f2528.7001
>
> We have not been able to duplicate the problem but we've experienced it
> 2 to 3 times within about 3 months.
>
> Below is the explanation I got from Transarc. They've informed us that a
> fix is en route. Has anybody ever experienced this in openafs (or
> anywhere)?
>
>
>
>
>
--------------ms010306060407030106070801
Content-Type: application/x-pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature
MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJXzCC
AwowggJzoAMCAQICAw7NrTANBgkqhkiG9w0BAQQFADBiMQswCQYDVQQGEwJaQTElMCMGA1UE
ChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNv
bmFsIEZyZWVtYWlsIElzc3VpbmcgQ0EwHhcNMDUwNTI3MTc0NzU3WhcNMDYwNTI3MTc0NzU3
WjBzMQ8wDQYDVQQEEwZBbHRtYW4xFTATBgNVBCoTDEplZmZyZXkgRXJpYzEcMBoGA1UEAxMT
SmVmZnJleSBFcmljIEFsdG1hbjErMCkGCSqGSIb3DQEJARYcamFsdG1hbkBzZWN1cmUtZW5k
cG9pbnRzLmNvbTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAKjPyrF+rdjOUSK/
bWwZHdx5p1+y6iiCd4vvYEVDxouYFp5C/fZEWm5n45ubBUbMSUI1MAZN6ooEoH09UTj6BXhM
S8B987ls81dKOIUphTF2jOzq8gsFmeA15yHMRAD20LqUWeLyvYk8FCNQw+dsKMMhX+WdsxOm
RY/1jPkJL6oN8kEwoUFkOX9/OfWWh6oFnV6faiEHUKDMFubsb9X0KVD8iIeR7Cxz7i4kXqRX
wMlp2fyoxcDIJrBaTY8nA++g3p34IkWt1a5po6g683nIgSnGpwYIwuJheBqSEZfLYWa+1KdD
6Sn27Ud94GqUvPVG5jC6zVC5EJ2aWuoAu+nNuV8CAwEAAaM5MDcwJwYDVR0RBCAwHoEcamFs
dG1hbkBzZWN1cmUtZW5kcG9pbnRzLmNvbTAMBgNVHRMBAf8EAjAAMA0GCSqGSIb3DQEBBAUA
A4GBADtvO//tjiAV6VJGtoNtrl34mB5jGyGTiotzw8riB6zz0GvY11bcWDmp6JKif+pVG+8L
IySDosbuva13qu2HwYUxBmWc7CoNd2k9kRlcrfbDUTTrGOZK8qyqNqT3gQZTAa9ZnUI0su9G
y/n2o5bQcaYdqR3htNrpvdLSPOWhILOXMIIDCjCCAnOgAwIBAgIDDs2tMA0GCSqGSIb3DQEB
BAUAMGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUgQ29uc3VsdGluZyAoUHR5KSBM
dGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgSXNzdWluZyBDQTAeFw0w
NTA1MjcxNzQ3NTdaFw0wNjA1MjcxNzQ3NTdaMHMxDzANBgNVBAQTBkFsdG1hbjEVMBMGA1UE
KhMMSmVmZnJleSBFcmljMRwwGgYDVQQDExNKZWZmcmV5IEVyaWMgQWx0bWFuMSswKQYJKoZI
hvcNAQkBFhxqYWx0bWFuQHNlY3VyZS1lbmRwb2ludHMuY29tMIIBIjANBgkqhkiG9w0BAQEF
AAOCAQ8AMIIBCgKCAQEAqM/KsX6t2M5RIr9tbBkd3HmnX7LqKIJ3i+9gRUPGi5gWnkL99kRa
bmfjm5sFRsxJQjUwBk3qigSgfT1ROPoFeExLwH3zuWzzV0o4hSmFMXaM7OryCwWZ4DXnIcxE
APbQupRZ4vK9iTwUI1DD52wowyFf5Z2zE6ZFj/WM+Qkvqg3yQTChQWQ5f3859ZaHqgWdXp9q
IQdQoMwW5uxv1fQpUPyIh5HsLHPuLiRepFfAyWnZ/KjFwMgmsFpNjycD76DenfgiRa3Vrmmj
qDrzeciBKcanBgjC4mF4GpIRl8thZr7Up0PpKfbtR33gapS89UbmMLrNULkQnZpa6gC76c25
XwIDAQABozkwNzAnBgNVHREEIDAegRxqYWx0bWFuQHNlY3VyZS1lbmRwb2ludHMuY29tMAwG
A1UdEwEB/wQCMAAwDQYJKoZIhvcNAQEEBQADgYEAO287/+2OIBXpUka2g22uXfiYHmMbIZOK
i3PDyuIHrPPQa9jXVtxYOanokqJ/6lUb7wsjJIOixu69rXeq7YfBhTEGZZzsKg13aT2RGVyt
9sNRNOsY5kryrKo2pPeBBlMBr1mdQjSy70bL+fajltBxph2pHeG02um90tI85aEgs5cwggM/
MIICqKADAgECAgENMA0GCSqGSIb3DQEBBQUAMIHRMQswCQYDVQQGEwJaQTEVMBMGA1UECBMM
V2VzdGVybiBDYXBlMRIwEAYDVQQHEwlDYXBlIFRvd24xGjAYBgNVBAoTEVRoYXd0ZSBDb25z
dWx0aW5nMSgwJgYDVQQLEx9DZXJ0aWZpY2F0aW9uIFNlcnZpY2VzIERpdmlzaW9uMSQwIgYD
VQQDExtUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgQ0ExKzApBgkqhkiG9w0BCQEWHHBlcnNv
bmFsLWZyZWVtYWlsQHRoYXd0ZS5jb20wHhcNMDMwNzE3MDAwMDAwWhcNMTMwNzE2MjM1OTU5
WjBiMQswCQYDVQQGEwJaQTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRk
LjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0EwgZ8wDQYJ
KoZIhvcNAQEBBQADgY0AMIGJAoGBAMSmPFVzVftOucqZWh5owHUEcJ3f6f+jHuy9zfVb8hp2
vX8MOmHyv1HOAdTlUAow1wJjWiyJFXCO3cnwK4Vaqj9xVsuvPAsH5/EfkTYkKhPPK9Xzgnc9
A74r/rsYPge/QIACZNenprufZdHFKlSFD0gEf6e20TxhBEAeZBlyYLf7AgMBAAGjgZQwgZEw
EgYDVR0TAQH/BAgwBgEB/wIBADBDBgNVHR8EPDA6MDigNqA0hjJodHRwOi8vY3JsLnRoYXd0
ZS5jb20vVGhhd3RlUGVyc29uYWxGcmVlbWFpbENBLmNybDALBgNVHQ8EBAMCAQYwKQYDVR0R
BCIwIKQeMBwxGjAYBgNVBAMTEVByaXZhdGVMYWJlbDItMTM4MA0GCSqGSIb3DQEBBQUAA4GB
AEiM0VCD6gsuzA2jZqxnD3+vrL7CF6FDlpSdf0whuPg2H6otnzYvwPQcUCCTcDz9reFhYsPZ
Ohl+hLGZGwDFGguCdJ4lUJRix9sncVcljd2pnDmOjCBPZV+V2vf3h9bGCE6u9uo05RAaWzVN
d+NWIXiC3CEZNd4ksdMdRv9dX2VPMYIDOzCCAzcCAQEwaTBiMQswCQYDVQQGEwJaQTElMCMG
A1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBl
cnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0ECAw7NrTAJBgUrDgMCGgUAoIIBpzAYBgkqhkiG
9w0BCQMxCwYJKoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0wNjA0MTgxNzUzNDBaMCMGCSqG
SIb3DQEJBDEWBBQwJeC7At6q5lrow1uwRNZX9fJqgjBSBgkqhkiG9w0BCQ8xRTBDMAoGCCqG
SIb3DQMHMA4GCCqGSIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG
9w0DAgIBKDB4BgkrBgEEAYI3EAQxazBpMGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3
dGUgQ29uc3VsdGluZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJl
ZW1haWwgSXNzdWluZyBDQQIDDs2tMHoGCyqGSIb3DQEJEAILMWugaTBiMQswCQYDVQQGEwJa
QTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhh
d3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0ECAw7NrTANBgkqhkiG9w0BAQEFAASC
AQBQOfUMYCRxiL8keD6RFWb7ebSLq4BpaTemQVnFqgB58sRzI/7Qo/8NsMkJ0VaDY6uDIeId
iINMb6xHFwKIIlOEwufdopomunVcsMEx9KTtO0+tJyo5S+ncaFCm3Km0snzsFtzuJoHB8gJo
AYv4NoPoH0331AbqMeyuz0MglOdVWsCmiXQD2HwEwYQl+nAj1ho/ZuwP4nVTnXKjLrqsn7dd
469H60FBx9Z1UTJ2hs7cG2NJhr1LeF0k11u7NaApzGO8NjsaPJwcQoJZnAKJGm/x+bLhVHbO
IcAB5eyh2J16b1bH/0ZL5SqNhWlErlCPpf9K+tGk3s04r6YDuhvIahaKAAAAAAAA
--------------ms010306060407030106070801--