[OpenAFS] 1.4.8, Rx Performance Improvements, and a Small Business Innovative Research grant

Thu, 02 Oct 2008 19:11:31 -0400

This is a cryptographically signed message in MIME format.

--------------ms010408080807080201090308
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

In discussions during 2007 with the HEPiX community, it was made 
clear to the gatekeepers that identifying and correcting trouble spots 
within Rx was one of the most important areas that OpenAFS needed to
improve in order to maintain the existing deployments within that
community of users.  No one had the resources to put towards such a 
pursuit and it was suggested that OpenAFS apply for a United States
Small Business Innovative Research (SBIR) grant to fund the work.  

There are two problems with such an approach.  First, OpenAFS does
not legally exist and even when it does have a legal Foundation, it
would not be eligible for an SBIR grant due to its not-for-profit
status.  Since we did not have any other source of funding to perform
the work it was suggested that one of the existing commercial support
companies submit a grant application.

In October 2007 I founded Your File System Inc. as a for-profit company 
that would be eligible to receive an SBIR grant and use the funding 
to accomplish two goals.  First, to benefit the OpenAFS community by
documenting the existing architectures and protocols used by OpenAFS
as well as engage in profiling and performance analysis that could be
used as input to the development of next generation implementations.
Secondly, because the company is receiving SBIR funding, to develop
a sustainable business model that could support the development of a 
next generation distributed storage system.  

The SBIR grant has provided funding for developer hours as well
as test equipment.  In particular, the SBIR grant has provided 
a 10GBit/second network testbed which is being used for Rx profiling.

I am pleased to announce the first public benefit to the OpenAFS 
community as a result of the SBIR grant with the expectation 
that there will be much more to come in the future.  

There have been many efforts over the last five years to improve Rx.
Tom Keiser implemented per thread free packet queues to reduce the 
contention for the global lock protecting the free packet queue.  
Other work has been performed to reduce the dependency on global 
locks.  Rx hot threads have been implemented on a broader range of 
platforms.  Various bug fixes have been accepted as they have been
validated.  Still with all of this work, Rx still has experienced
noticeable performance problems.  In November 2006 there was 
discussion regarding a 350ms hiccup that was experienced repeatedly
and was significantly hampering performance.  Several folks have
tried to pin it down over the years unsuccessfully.

Funded by the SBIR grant there have been efforts over the last couple
of months to analyze Rx performance data from a number of sources.
There were several symptoms identified that it was unclear were related
to the hiccup but were worth investigating.  First there was a periodic 
out of memory error experienced in Windows test clients.  Second, there 
was a consistent lack of free packets.  Third, there were a much larger 
number of retries than could be explained due to packets lossage on the 
network.  

What the investigations uncovered were a related set of problems;
some of which affect all implementations of Rx derived from the Transarc
implementation.  The problems fall into several categories:

   1. Resetting a Call object emptied packet queues without adding the 
      packets to the free packet queue.  rxi_ResetCall() would call 
      queue_Init() on queues with active rx_packets on them.  once the 
      queues were cleared the packets were leaked and any acknowledgment 
      of receipt or transmission of other outgoing data would be lost.  
      Instead of initializing the queues the contents of the queues should 
      simply be freed either by a call to rxi_FreePackets() or by setting 
      the force flag on rxi_ClearTransmitQueue() and rxi_ClearReceiveQueue().
   2. Packets queued for transmission would not be sent.
      In rx.c there were two instances of RX_GLOBAL_RXLOCK_KERNEL which
      should have been AFS_GLOBAL_RXLOCK_KERNEL.  This oversight
      resulted in rx_calls that were actively transmitting packets to
      reset the call prematurely and leak the outgoing packets.
   3. Packets would be leaked while read operations were progressing.
      rxi_ReadProc()/rxi_ReadProc32() failed to remove the currentPacket
      and put it on the call's iov queue when all of its data was read. 
      This resulted in the packet being lost either when the next read
      packet was fetched, when the next packet was transmitted, or when
      the call was reset. 
   4. The algorithm in OpenAFS which is used to allocate additional
      packets when there were no free packets was overly aggressive.  It
      was based on the overall number of packets that had been
      previously allocated.  Each allocation would increase a larger
      number than the previous one. 

The side effects of these issues have been present in AFS for a very
long time and have been seen in both clients and servers.  Corrections
for these errors have been integrated into 1.5.53 and 1.4.8-pre1.

As a result of these problems Rx was periodically not sending the 
anticipated acknowledgment packet which in turn resulted in a timeout
and retransmission.  The Rx stack was also frequently finding itself
out of free packets and was forced to block on a global lock while
additional packets structures were allocated from the process' 
memory pool.  The end result was a performance improvement of greater
than 9.5% when comparing the Rx performance of 1.4.8 over 1.4.7.  

Rough tests show that the 1.4.8 Rx stack is capable of 124MBytes/second
over a 10Gbit link.  There is still a long way to go to fill a 10Gbit
pipe but it is a start.  Now we are only off by one order of magnitude.

Some might ask, "how is it that these bugs remained present in the OpenAFS
source tree for all of these years?"   The answer is quite simple.  "No
one ever thought to look for packet leaks."   Many organizations still 
perform weekly server restarts and never noticed the memory leaks and
the rxdebug -rxstats output lists the free packet count but no one ever 
thought that it was important to report the number of allocated packets.   
As a result no one noticed that the reason there were free packets available 
was because packets were constantly being allocated instead of recycled.
Over the years many individuals have noticed the extra resends.  Its just
that no one was able to identify why they were being sent.  The resends
did not prevent the system from functioning.  It was just slower than 
it should be.

As these changes become available for both clients and servers I am 
expecting users to see a much improved throughput rate and several 
previously unexplained server and client crashes will now being a thing
of the past.

In order to get OpenAFS 1.4.8 released we need the assistance of the
community to test the pre-releases.  1.4.8-pre1 was announced yesterday.
The best way to move the release process along is for organizations 
that deploy OpenAFS to test the pre-releases and send e-mails to this
mailing list confirming what works.  Silence cannot be interpreted by
the gatekeepers as all is well.  

I look forward to your reports of success and to reporting future 
grant funded contributions in the future.

Jeffrey Altman

--------------ms010408080807080201090308
Content-Type: application/x-pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJeTCC
AxcwggKAoAMCAQICEDsE+kRcmomW1hYG6BoqhGEwDQYJKoZIhvcNAQEFBQAwYjELMAkGA1UE
BhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMT
I1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA4MDUzMDE5MTUyOVoX
DTA5MDUzMDE5MTUyOVowczEPMA0GA1UEBBMGQWx0bWFuMRUwEwYDVQQqEwxKZWZmcmV5IEVy
aWMxHDAaBgNVBAMTE0plZmZyZXkgRXJpYyBBbHRtYW4xKzApBgkqhkiG9w0BCQEWHGphbHRt
YW5Ac2VjdXJlLWVuZHBvaW50cy5jb20wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIB
AQCtf5bVJdYFtHIrV2XALpA5oaMu7FPYU7RP7vJhd8Cu9Kd9ud2crX2pHK4avuPaYb4Vg9qI
zPrPadePhJ3OWwNt1ZlUlpc5URnOfpg/I9iymZBUSnCFVLuIvoncacqyUlzqdYEF8XGEoEL6
6bj8uoCSX0D7ZjZiAS8993NvgiPYpf10acMyWQ4max+P7Wg9T03Nw2F6EsmP6gWxBRsekTXe
N6QjJdvaK0846lDqeBFoCEzIUMQXj2kiXVPCPEdxPc/L1sDMYf0GLaDIg8qyThpGd0X6DwfK
3RWcMy8DV7Q5Z+jSEdPn5X0l4anOTrjr3IwE57MC3bVs0EEpUODTzftnAgMBAAGjOTA3MCcG
A1UdEQQgMB6BHGphbHRtYW5Ac2VjdXJlLWVuZHBvaW50cy5jb20wDAYDVR0TAQH/BAIwADAN
BgkqhkiG9w0BAQUFAAOBgQA9kndmeLrdQOUbhNGGms/FnfDyraH4OjA4PIIMOCbGWK0YXczs
/Fqn4XkT70SG4s8v4Zg6TaAcJrZBVcZQXyzrhlF2Zev/g69zZMHQe+2r4i/3FBVKAtFCoea1
vgwJ5TfZYlKvt4D0Z4zexu9Y0VwCIR4plWjVD76zC2CGB/2fhjCCAxcwggKAoAMCAQICEDsE
+kRcmomW1hYG6BoqhGEwDQYJKoZIhvcNAQEFBQAwYjELMAkGA1UEBhMCWkExJTAjBgNVBAoT
HFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25h
bCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA4MDUzMDE5MTUyOVoXDTA5MDUzMDE5MTUyOVow
czEPMA0GA1UEBBMGQWx0bWFuMRUwEwYDVQQqEwxKZWZmcmV5IEVyaWMxHDAaBgNVBAMTE0pl
ZmZyZXkgRXJpYyBBbHRtYW4xKzApBgkqhkiG9w0BCQEWHGphbHRtYW5Ac2VjdXJlLWVuZHBv
aW50cy5jb20wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCtf5bVJdYFtHIrV2XA
LpA5oaMu7FPYU7RP7vJhd8Cu9Kd9ud2crX2pHK4avuPaYb4Vg9qIzPrPadePhJ3OWwNt1ZlU
lpc5URnOfpg/I9iymZBUSnCFVLuIvoncacqyUlzqdYEF8XGEoEL66bj8uoCSX0D7ZjZiAS89
93NvgiPYpf10acMyWQ4max+P7Wg9T03Nw2F6EsmP6gWxBRsekTXeN6QjJdvaK0846lDqeBFo
CEzIUMQXj2kiXVPCPEdxPc/L1sDMYf0GLaDIg8qyThpGd0X6DwfK3RWcMy8DV7Q5Z+jSEdPn
5X0l4anOTrjr3IwE57MC3bVs0EEpUODTzftnAgMBAAGjOTA3MCcGA1UdEQQgMB6BHGphbHRt
YW5Ac2VjdXJlLWVuZHBvaW50cy5jb20wDAYDVR0TAQH/BAIwADANBgkqhkiG9w0BAQUFAAOB
gQA9kndmeLrdQOUbhNGGms/FnfDyraH4OjA4PIIMOCbGWK0YXczs/Fqn4XkT70SG4s8v4Zg6
TaAcJrZBVcZQXyzrhlF2Zev/g69zZMHQe+2r4i/3FBVKAtFCoea1vgwJ5TfZYlKvt4D0Z4ze
xu9Y0VwCIR4plWjVD76zC2CGB/2fhjCCAz8wggKooAMCAQICAQ0wDQYJKoZIhvcNAQEFBQAw
gdExCzAJBgNVBAYTAlpBMRUwEwYDVQQIEwxXZXN0ZXJuIENhcGUxEjAQBgNVBAcTCUNhcGUg
VG93bjEaMBgGA1UEChMRVGhhd3RlIENvbnN1bHRpbmcxKDAmBgNVBAsTH0NlcnRpZmljYXRp
b24gU2VydmljZXMgRGl2aXNpb24xJDAiBgNVBAMTG1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFp
bCBDQTErMCkGCSqGSIb3DQEJARYccGVyc29uYWwtZnJlZW1haWxAdGhhd3RlLmNvbTAeFw0w
MzA3MTcwMDAwMDBaFw0xMzA3MTYyMzU5NTlaMGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxU
aGF3dGUgQ29uc3VsdGluZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwg
RnJlZW1haWwgSXNzdWluZyBDQTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAxKY8VXNV
+065yplaHmjAdQRwnd/p/6Me7L3N9VvyGna9fww6YfK/Uc4B1OVQCjDXAmNaLIkVcI7dyfAr
hVqqP3FWy688Cwfn8R+RNiQqE88r1fOCdz0Dviv+uxg+B79AgAJk16emu59l0cUqVIUPSAR/
p7bRPGEEQB5kGXJgt/sCAwEAAaOBlDCBkTASBgNVHRMBAf8ECDAGAQH/AgEAMEMGA1UdHwQ8
MDowOKA2oDSGMmh0dHA6Ly9jcmwudGhhd3RlLmNvbS9UaGF3dGVQZXJzb25hbEZyZWVtYWls
Q0EuY3JsMAsGA1UdDwQEAwIBBjApBgNVHREEIjAgpB4wHDEaMBgGA1UEAxMRUHJpdmF0ZUxh
YmVsMi0xMzgwDQYJKoZIhvcNAQEFBQADgYEASIzRUIPqCy7MDaNmrGcPf6+svsIXoUOWlJ1/
TCG4+DYfqi2fNi/A9BxQIJNwPP2t4WFiw9k6GX6EsZkbAMUaC4J0niVQlGLH2ydxVyWN3amc
OY6MIE9lX5Xa9/eH1sYITq726jTlEBpbNU1341YheILcIRk13iSx0x1G/11fZU8xggNkMIID
YAIBATB2MGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUgQ29uc3VsdGluZyAoUHR5
KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgSXNzdWluZyBDQQIQ
OwT6RFyaiZbWFgboGiqEYTAJBgUrDgMCGgUAoIIBwzAYBgkqhkiG9w0BCQMxCwYJKoZIhvcN
AQcBMBwGCSqGSIb3DQEJBTEPFw0wODEwMDIyMzExMzFaMCMGCSqGSIb3DQEJBDEWBBRqwUQR
zShdvPOsIOrajnkYrbvwWjBSBgkqhkiG9w0BCQ8xRTBDMAoGCCqGSIb3DQMHMA4GCCqGSIb3
DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG9w0DAgIBKDCBhQYJKwYB
BAGCNxAEMXgwdjBiMQswCQYDVQQGEwJaQTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcg
KFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3Vpbmcg
Q0ECEDsE+kRcmomW1hYG6BoqhGEwgYcGCyqGSIb3DQEJEAILMXigdjBiMQswCQYDVQQGEwJa
QTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhh
d3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0ECEDsE+kRcmomW1hYG6BoqhGEwDQYJ
KoZIhvcNAQEBBQAEggEAoHkm0Zb3+z+6Q+iwfm9efRERqnXcg8A8vq1IX5s19BkvJutbP7k6
Vn2udEw3QHnZlPbUcBl+Eqt7GLI91W7JRq+oaDav7HWL28vGT+G1BIN2036zb6tNUrVfwYSV
hG5QYhRjnkDi+liWBM17Zd/9RkyejHySqzKng0h0TbnugenMP3coKHpCsaspv8dwARQuYTYz
Yi+wrC/lrUYFleJ2Ie1rvXm3GM0Gy2Ji4/Cs/de/6uneTGPfzri52+XYxI9nV5KqpzTGWANP
6MFIsClAmmL39uL2FaAltra/Y+sTF7kSQw9gRLCjJ6H3iGEmCBdwiUXZBXYhSR1aslRB9IEx
dgAAAAAAAA==
--------------ms010408080807080201090308--