[OpenAFS] Puzzler: lack of access to AFS files

Jeffrey Altman jaltman@secure-endpoints.com
Wed, 12 Dec 2007 23:06:43 -0500


This is a cryptographically signed message in MIME format.

--------------ms050301070004000509000105
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Rodney M. Dyer wrote:
> I understand this, however you need to realize where I'm coming from. 
> We support professors who have research projects that run into the
> millions of dollars.  Many times these people don't know anything about
> where their data files are being saved when they choose "File->Save"
> from an application.  They expect it to work.  We need to be in a
> position to provide the "works" part.  If they save a valuable data file
> from an application one day, then return the next and the application
> won't load it because of some random network change updated a few bytes
> here or there when the file was saved, what do we tell them?  "Oh btw,
> maybe you should keep a local copy on your USB keychain unless the AFS
> network fails?"  Most professors don't spend the extra time to run
> checksums on their files after the save.  This kind of thing doesn't cut
> it.  I'm the type of "professional" sysadmin who's willing to give up 10
> percent of my speed for guaranteed delivery.  I'm not some young post
> high school geek who's got a job running a smallish home network and
> constantly boasts product x is faster than product y, and that's just
> uber cool because product y sux'ors!

The data corruption error that was discovered in January and reported by
David Bolt to OpenAFS RT was fixed in the 15 February 2008 release.  At
the time of the announcement I stressed the importance of upgrading
because of the seriousness of the error.

For those who are unfamiliar, if during a background write operation to
the file server the network drops out for any reason, the daemon thread
would drop all of the dirty buffers that were in progress on the floor
and mark them as clean.  The end result would be a hole in the file on
the file server either leaving the previous data or a page full of
zeros.  This error was present in the original OpenAFS 1.0 release.  IBM
fixed the problem in the 3.6.2.59 release of IBM AFS for Windows.
OpenAFS fixed it in 1.5.15.

As for the performance improvements, I'm not on a performance kick for
the hell of it.  I'm on a performance kick because large OpenAFS users
have repeatedly mentioned the performance of the Windows client as one
reason why they are moving away from AFS to CIFS.  In addition, the file
servers are experiencing serious scalability issues and a large part of
the problem is that the Windows clients have not been as smart as they
could be and have re-requested data from the file servers that should
have been accessed from the cache.

Stupid things like re-using objects that were recently accessed because
the queues did not track objects in the order of most recent use.  Being
forced to read data or directory entries from the file server that was
just written by the client because data buffer version numbers weren't
incremented when merging the updated status data received as a result of
the write or the failure to locally update the directory entries when
possible.  Re-issuing FetchStatus calls on .readonly volumes prematurely
because the volume callback expirations were not tracked by each object
in the volume.  Some of the changes result in improved performance of
the client when measured by throughput.   Other changes reduced the CPU
time required by the client but most of all, the improvements have
reduced network traffic and load on the file servers.

Some of the changes have unfortunately triggered bug in the file servers
that in turn have to be fixed.  That is the case with the
GiveUpAllCallBacks RPC bug that exists in all file servers from 1.3.50
to 1.4.5.  The attempt to be a good citizen by giving up callbacks when
we know that the server will be unable to contact us since we are
suspended or shutdown resulted in corruption of the file server state
data and the possibility of eventual file server crashes.

I am very thankful for the efforts you put into helping track down the
thread safety issues in 1.5.26 as well as the issues with the infinite
loop detection code that was added to 1.5.21 which resulted in client
crashes.  As you are well aware the thread safety issues were
particularly challenging to reproduce and identify.  It is both
fortunate and unfortunate that your use case was the perfect use case to
trigger the race condition.   The race condition was finally fixed
thanks to your efforts in 1.5.27.

1.5.28 in turn fixes addition crash reports that were received by the
Windows Error Reporting service. Nothing significant.  The crash
conditions are so rare that I doubt anyone who did experience them could
reproduce them.

As I said over the summer, I was truly embarrassed by the quality issues
in the releases from 1.5.21 to 1.5.25.  I do my best to test things
given the tools at my disposal.  Unfortunately, I don't not have a test
environment that can replicate all of the possible multiple client
interactions.

> I am happy with the speed improvements, and I hope we can continue to
> use AFS.  However I need to be able to look at people with a straight
> face when they ask about how well AFS works.
> 
>      Speed?  Check
>      Scale?  Check
>      Functionality?  Check
>      Reliablity?  hrm...

You see I would actually give us less credit than that:

Speed?  Not so much but you can get decent performance for specific
classes of use cases

Scale?  Well, we have global access but what Transarc advertised in the
mid-90s as infinite scalability has not lived up to the claims.  The
file servers are capable of handling approximately 100 simultaneous
requests and when those requests require network traffic to query the
client's identity, obtain protection data, or communicate with the
volume database server, the threads sit idle blocked on the I/O.  The
actual throughput of a given file server is far below what it needs to
be if we are truly going to be serving petabytes of data to tens of
thousands of clients from each file server.

Functionality? Hmm.  Much of the complexity that was added this summer
for the directory searching was necessary because of the lack of
functionality in the AFS3 protocols.  The locking issues that everyone
runs into are also a lack of functionality.  Shall we discuss Unicode
object names in profile directories and the data corruption that
produces?  What about the inability to maintain data connectivity due to
the CIFS client timeouts?  Do you like having Office apps crash on you?
 I sure don't.

Reliability? Given everything else I actually mark reliability on the
high side.  At least when there is an issue, we get a fix out ASAP.

The funny thing is that even with all of the negatives I have mentioned
I actually think that OpenAFS is the best it has ever been.  I am
finally at the point where I am willing to say to people that I think
you should consider OpenAFS for new deployments.  Do we still have
issues?  Absolutely.  But we also have plans and we have a growing
number of skilled developers who are actively contributing to make
OpenAFS better on a significantly broad number of platforms.

If you need a file system that is going to provide good WAN performance
with federated authentication and high availability, you really can't
find anything else out there.

Jeffrey Altman


--------------ms050301070004000509000105
Content-Type: application/x-pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJeTCC
AxcwggKAoAMCAQICEALr5BE3U6n+HWCoLbyhohMwDQYJKoZIhvcNAQEFBQAwYjELMAkGA1UE
BhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMT
I1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA3MDUzMTA2MTM1N1oX
DTA4MDUzMDA2MTM1N1owczEPMA0GA1UEBBMGQWx0bWFuMRUwEwYDVQQqEwxKZWZmcmV5IEVy
aWMxHDAaBgNVBAMTE0plZmZyZXkgRXJpYyBBbHRtYW4xKzApBgkqhkiG9w0BCQEWHGphbHRt
YW5Ac2VjdXJlLWVuZHBvaW50cy5jb20wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIB
AQCsoz/0+s4Cn65n/3bU3shXw4y5u1uEMEsBOiqNU0PfIKGYQe95b1FKNbNAkctSdQT6GF5c
bhSnJPmb2OOb1frx64dlDgskaG561xa8XPA1aP8Cc+33dgsSLIxGEh97lyUYHEfWBC03KMCF
PKhZfcrGAXoVCrFBadnLAokQbUTFahVg/qQx2IT3wSj1sCIfV5UDuXcEKHCvRtEZIsSzu184
9Cj6I4nY5bt+r94kyDHM94MHYBJi+6tWLFRy2gkIB3HEPmxAiQrKljNpH9bOffiBLIAgmJ6d
1ZXepBXyexQbwOYvftpVlMEFHHQmdiwH3tj69hE78XvM5X9J+SbjbuNpAgMBAAGjOTA3MCcG
A1UdEQQgMB6BHGphbHRtYW5Ac2VjdXJlLWVuZHBvaW50cy5jb20wDAYDVR0TAQH/BAIwADAN
BgkqhkiG9w0BAQUFAAOBgQB8FShDN2Ig034Y5eyadiFDEtOvsIJ3Z2xV9aTL4u8xMlz1gZR1
AZAvCv+ZMMRRKWCsrG5tItV8DFPSfWAGMpInmMarA4f76JRLQEUhkRUg8GpkJM5ryk5EDakk
0oiBQcQD8A+UHwrcmaj3UWxQ9zCjDgU+1mY9nEQxZZyp4eeUfzCCAxcwggKAoAMCAQICEALr
5BE3U6n+HWCoLbyhohMwDQYJKoZIhvcNAQEFBQAwYjELMAkGA1UEBhMCWkExJTAjBgNVBAoT
HFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25h
bCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA3MDUzMTA2MTM1N1oXDTA4MDUzMDA2MTM1N1ow
czEPMA0GA1UEBBMGQWx0bWFuMRUwEwYDVQQqEwxKZWZmcmV5IEVyaWMxHDAaBgNVBAMTE0pl
ZmZyZXkgRXJpYyBBbHRtYW4xKzApBgkqhkiG9w0BCQEWHGphbHRtYW5Ac2VjdXJlLWVuZHBv
aW50cy5jb20wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCsoz/0+s4Cn65n/3bU
3shXw4y5u1uEMEsBOiqNU0PfIKGYQe95b1FKNbNAkctSdQT6GF5cbhSnJPmb2OOb1frx64dl
DgskaG561xa8XPA1aP8Cc+33dgsSLIxGEh97lyUYHEfWBC03KMCFPKhZfcrGAXoVCrFBadnL
AokQbUTFahVg/qQx2IT3wSj1sCIfV5UDuXcEKHCvRtEZIsSzu1849Cj6I4nY5bt+r94kyDHM
94MHYBJi+6tWLFRy2gkIB3HEPmxAiQrKljNpH9bOffiBLIAgmJ6d1ZXepBXyexQbwOYvftpV
lMEFHHQmdiwH3tj69hE78XvM5X9J+SbjbuNpAgMBAAGjOTA3MCcGA1UdEQQgMB6BHGphbHRt
YW5Ac2VjdXJlLWVuZHBvaW50cy5jb20wDAYDVR0TAQH/BAIwADANBgkqhkiG9w0BAQUFAAOB
gQB8FShDN2Ig034Y5eyadiFDEtOvsIJ3Z2xV9aTL4u8xMlz1gZR1AZAvCv+ZMMRRKWCsrG5t
ItV8DFPSfWAGMpInmMarA4f76JRLQEUhkRUg8GpkJM5ryk5EDakk0oiBQcQD8A+UHwrcmaj3
UWxQ9zCjDgU+1mY9nEQxZZyp4eeUfzCCAz8wggKooAMCAQICAQ0wDQYJKoZIhvcNAQEFBQAw
gdExCzAJBgNVBAYTAlpBMRUwEwYDVQQIEwxXZXN0ZXJuIENhcGUxEjAQBgNVBAcTCUNhcGUg
VG93bjEaMBgGA1UEChMRVGhhd3RlIENvbnN1bHRpbmcxKDAmBgNVBAsTH0NlcnRpZmljYXRp
b24gU2VydmljZXMgRGl2aXNpb24xJDAiBgNVBAMTG1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFp
bCBDQTErMCkGCSqGSIb3DQEJARYccGVyc29uYWwtZnJlZW1haWxAdGhhd3RlLmNvbTAeFw0w
MzA3MTcwMDAwMDBaFw0xMzA3MTYyMzU5NTlaMGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxU
aGF3dGUgQ29uc3VsdGluZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwg
RnJlZW1haWwgSXNzdWluZyBDQTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAxKY8VXNV
+065yplaHmjAdQRwnd/p/6Me7L3N9VvyGna9fww6YfK/Uc4B1OVQCjDXAmNaLIkVcI7dyfAr
hVqqP3FWy688Cwfn8R+RNiQqE88r1fOCdz0Dviv+uxg+B79AgAJk16emu59l0cUqVIUPSAR/
p7bRPGEEQB5kGXJgt/sCAwEAAaOBlDCBkTASBgNVHRMBAf8ECDAGAQH/AgEAMEMGA1UdHwQ8
MDowOKA2oDSGMmh0dHA6Ly9jcmwudGhhd3RlLmNvbS9UaGF3dGVQZXJzb25hbEZyZWVtYWls
Q0EuY3JsMAsGA1UdDwQEAwIBBjApBgNVHREEIjAgpB4wHDEaMBgGA1UEAxMRUHJpdmF0ZUxh
YmVsMi0xMzgwDQYJKoZIhvcNAQEFBQADgYEASIzRUIPqCy7MDaNmrGcPf6+svsIXoUOWlJ1/
TCG4+DYfqi2fNi/A9BxQIJNwPP2t4WFiw9k6GX6EsZkbAMUaC4J0niVQlGLH2ydxVyWN3amc
OY6MIE9lX5Xa9/eH1sYITq726jTlEBpbNU1341YheILcIRk13iSx0x1G/11fZU8xggNkMIID
YAIBATB2MGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUgQ29uc3VsdGluZyAoUHR5
KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgSXNzdWluZyBDQQIQ
AuvkETdTqf4dYKgtvKGiEzAJBgUrDgMCGgUAoIIBwzAYBgkqhkiG9w0BCQMxCwYJKoZIhvcN
AQcBMBwGCSqGSIb3DQEJBTEPFw0wNzEyMTMwNDA2NDNaMCMGCSqGSIb3DQEJBDEWBBTR2O97
G6N7ORDgT8FL7hDNdt4I1jBSBgkqhkiG9w0BCQ8xRTBDMAoGCCqGSIb3DQMHMA4GCCqGSIb3
DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG9w0DAgIBKDCBhQYJKwYB
BAGCNxAEMXgwdjBiMQswCQYDVQQGEwJaQTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcg
KFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3Vpbmcg
Q0ECEALr5BE3U6n+HWCoLbyhohMwgYcGCyqGSIb3DQEJEAILMXigdjBiMQswCQYDVQQGEwJa
QTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhh
d3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0ECEALr5BE3U6n+HWCoLbyhohMwDQYJ
KoZIhvcNAQEBBQAEggEAG2ypijcVG+UxYjRRhEQCTR+6Tg1Mhj307FKOdPdsIEZznkTN+5TB
PjYi10mJ5T4lyqmMczF6vmDjk64ZWld35aiymrxUZ8K4Y8IJvUZvwAyYqpAzSPNl96vu8wbE
Y61G6i+HH4K87cb6xQeVPzJSeAcDri7Fk8CxQnAfkb2damdhfdBZbrzOXz1mIyDTdtvUmPeS
RYANHw2AtoJXNW0/Y3dXyebg4jkZjlBFwUoTk++BF2ognlR9GTUBg+MTIrG5xTqt+7ONg800
F9KguZNVySJlIZ2WDVz0sT2Fi7H3SaUQu8KuNdhIl5HpYgzhrJhCfLjY/dfjix9VCP51xwyy
FQAAAAAAAA==
--------------ms050301070004000509000105--