[OpenAFS-devel] AFS vs UNICODE

Jeffrey Altman jaltman@secure-endpoints.com
Tue, 06 May 2008 10:04:17 -0400


This is a cryptographically signed message in MIME format.

--------------ms020303030303000008060702
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable

While implementing the UNICODE support for the Windows client it has=20
become clear that the Unicode Normalization issues have already bitten=20
us. =20

Erik Dal=E9n pointed out to me earlier today that there are=20
interoperability issues between MacOS X and Linux clients.  MacOS X=20
clients produce file names using Unicode Normalization Form D whereas=20
Linux clients are either using Unicode Normalization Form C or nothing=20
at all.   The end result is that when a MacOS X client attempts to open=20
a file created by a Linux client, MacOS X will create the resource fork=20
file using NFD and then fail to open the file because the NFD encoded=20
name does not exist in the AFS directory hash table.

This is going to be fairly easy for the Windows client to adjust to=20
because it already does not rely upon the AFS directory hash table for=20
file name lookups.  The Windows client cannot rely on the hash table=20
because it is case sensitive and Windows lookups are case insensitive=20
with a preference for exact matches.=20

On Windows, the client will perform a NFC conversion on all names in the =

directory that are valid UTF-8 as part of its B+ tree construction=20
process.  It will also perform a NFC conversion on all symlink=20
targets.   Finally, all strings provided by the operating system will be =

NFC converted before performing the directory lookup or creating a symlin=
k.

The question is what to do about the clients on other platforms that do=20
rely on the AFS directory hash table and on the file servers.

The reason that we want to use NFC is because the resulting strings are=20
shorter and therefore the effective length of file names can be longer.

Whatever we do we are going to have an interop problem on MacOS X but=20
since upgrading MacOS X clients is so much easier to do than other=20
platforms I will suggest that we bite the bullet there.

Proposal:

   1. MacOS X and Linux clients begin to apply NFC to all UTF-8 strings
      obtained from the operating system whether for directory lookup,
      object creation, or symlink target creation.
   2. Implement NFC conversion in the Salvager.  This will apply to all
      names in directories and will require that directory hash tables
      be fixed when a name is changed to NFC.  It will also have to
      apply to symlink targets.
   3. In the File Server, apply NFC conversion to the names provided in
      CreateFile, Link and Symlink RPCs as well as the targets in the
      Link and Symlink RPCs.

The real problem with this problem is that once the new file server is=20
deployed and the salvager is run against the volumes the existing MacOS=20
X clients will fail to be able to read any files in AFS.   If anyone has =

an idea of how to address the Unicode normalization problem going=20
forward that doesn't result in an interop failure for existing clients,=20
please say something.

Jeffrey Altman






--------------ms020303030303000008060702
Content-Type: application/x-pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJeTCC
AxcwggKAoAMCAQICEALr5BE3U6n+HWCoLbyhohMwDQYJKoZIhvcNAQEFBQAwYjELMAkGA1UE
BhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMT
I1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA3MDUzMTA2MTM1N1oX
DTA4MDUzMDA2MTM1N1owczEPMA0GA1UEBBMGQWx0bWFuMRUwEwYDVQQqEwxKZWZmcmV5IEVy
aWMxHDAaBgNVBAMTE0plZmZyZXkgRXJpYyBBbHRtYW4xKzApBgkqhkiG9w0BCQEWHGphbHRt
YW5Ac2VjdXJlLWVuZHBvaW50cy5jb20wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIB
AQCsoz/0+s4Cn65n/3bU3shXw4y5u1uEMEsBOiqNU0PfIKGYQe95b1FKNbNAkctSdQT6GF5c
bhSnJPmb2OOb1frx64dlDgskaG561xa8XPA1aP8Cc+33dgsSLIxGEh97lyUYHEfWBC03KMCF
PKhZfcrGAXoVCrFBadnLAokQbUTFahVg/qQx2IT3wSj1sCIfV5UDuXcEKHCvRtEZIsSzu184
9Cj6I4nY5bt+r94kyDHM94MHYBJi+6tWLFRy2gkIB3HEPmxAiQrKljNpH9bOffiBLIAgmJ6d
1ZXepBXyexQbwOYvftpVlMEFHHQmdiwH3tj69hE78XvM5X9J+SbjbuNpAgMBAAGjOTA3MCcG
A1UdEQQgMB6BHGphbHRtYW5Ac2VjdXJlLWVuZHBvaW50cy5jb20wDAYDVR0TAQH/BAIwADAN
BgkqhkiG9w0BAQUFAAOBgQB8FShDN2Ig034Y5eyadiFDEtOvsIJ3Z2xV9aTL4u8xMlz1gZR1
AZAvCv+ZMMRRKWCsrG5tItV8DFPSfWAGMpInmMarA4f76JRLQEUhkRUg8GpkJM5ryk5EDakk
0oiBQcQD8A+UHwrcmaj3UWxQ9zCjDgU+1mY9nEQxZZyp4eeUfzCCAxcwggKAoAMCAQICEALr
5BE3U6n+HWCoLbyhohMwDQYJKoZIhvcNAQEFBQAwYjELMAkGA1UEBhMCWkExJTAjBgNVBAoT
HFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25h
bCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA3MDUzMTA2MTM1N1oXDTA4MDUzMDA2MTM1N1ow
czEPMA0GA1UEBBMGQWx0bWFuMRUwEwYDVQQqEwxKZWZmcmV5IEVyaWMxHDAaBgNVBAMTE0pl
ZmZyZXkgRXJpYyBBbHRtYW4xKzApBgkqhkiG9w0BCQEWHGphbHRtYW5Ac2VjdXJlLWVuZHBv
aW50cy5jb20wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCsoz/0+s4Cn65n/3bU
3shXw4y5u1uEMEsBOiqNU0PfIKGYQe95b1FKNbNAkctSdQT6GF5cbhSnJPmb2OOb1frx64dl
DgskaG561xa8XPA1aP8Cc+33dgsSLIxGEh97lyUYHEfWBC03KMCFPKhZfcrGAXoVCrFBadnL
AokQbUTFahVg/qQx2IT3wSj1sCIfV5UDuXcEKHCvRtEZIsSzu1849Cj6I4nY5bt+r94kyDHM
94MHYBJi+6tWLFRy2gkIB3HEPmxAiQrKljNpH9bOffiBLIAgmJ6d1ZXepBXyexQbwOYvftpV
lMEFHHQmdiwH3tj69hE78XvM5X9J+SbjbuNpAgMBAAGjOTA3MCcGA1UdEQQgMB6BHGphbHRt
YW5Ac2VjdXJlLWVuZHBvaW50cy5jb20wDAYDVR0TAQH/BAIwADANBgkqhkiG9w0BAQUFAAOB
gQB8FShDN2Ig034Y5eyadiFDEtOvsIJ3Z2xV9aTL4u8xMlz1gZR1AZAvCv+ZMMRRKWCsrG5t
ItV8DFPSfWAGMpInmMarA4f76JRLQEUhkRUg8GpkJM5ryk5EDakk0oiBQcQD8A+UHwrcmaj3
UWxQ9zCjDgU+1mY9nEQxZZyp4eeUfzCCAz8wggKooAMCAQICAQ0wDQYJKoZIhvcNAQEFBQAw
gdExCzAJBgNVBAYTAlpBMRUwEwYDVQQIEwxXZXN0ZXJuIENhcGUxEjAQBgNVBAcTCUNhcGUg
VG93bjEaMBgGA1UEChMRVGhhd3RlIENvbnN1bHRpbmcxKDAmBgNVBAsTH0NlcnRpZmljYXRp
b24gU2VydmljZXMgRGl2aXNpb24xJDAiBgNVBAMTG1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFp
bCBDQTErMCkGCSqGSIb3DQEJARYccGVyc29uYWwtZnJlZW1haWxAdGhhd3RlLmNvbTAeFw0w
MzA3MTcwMDAwMDBaFw0xMzA3MTYyMzU5NTlaMGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxU
aGF3dGUgQ29uc3VsdGluZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwg
RnJlZW1haWwgSXNzdWluZyBDQTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAxKY8VXNV
+065yplaHmjAdQRwnd/p/6Me7L3N9VvyGna9fww6YfK/Uc4B1OVQCjDXAmNaLIkVcI7dyfAr
hVqqP3FWy688Cwfn8R+RNiQqE88r1fOCdz0Dviv+uxg+B79AgAJk16emu59l0cUqVIUPSAR/
p7bRPGEEQB5kGXJgt/sCAwEAAaOBlDCBkTASBgNVHRMBAf8ECDAGAQH/AgEAMEMGA1UdHwQ8
MDowOKA2oDSGMmh0dHA6Ly9jcmwudGhhd3RlLmNvbS9UaGF3dGVQZXJzb25hbEZyZWVtYWls
Q0EuY3JsMAsGA1UdDwQEAwIBBjApBgNVHREEIjAgpB4wHDEaMBgGA1UEAxMRUHJpdmF0ZUxh
YmVsMi0xMzgwDQYJKoZIhvcNAQEFBQADgYEASIzRUIPqCy7MDaNmrGcPf6+svsIXoUOWlJ1/
TCG4+DYfqi2fNi/A9BxQIJNwPP2t4WFiw9k6GX6EsZkbAMUaC4J0niVQlGLH2ydxVyWN3amc
OY6MIE9lX5Xa9/eH1sYITq726jTlEBpbNU1341YheILcIRk13iSx0x1G/11fZU8xggNkMIID
YAIBATB2MGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUgQ29uc3VsdGluZyAoUHR5
KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgSXNzdWluZyBDQQIQ
AuvkETdTqf4dYKgtvKGiEzAJBgUrDgMCGgUAoIIBwzAYBgkqhkiG9w0BCQMxCwYJKoZIhvcN
AQcBMBwGCSqGSIb3DQEJBTEPFw0wODA1MDYxNDA0MTdaMCMGCSqGSIb3DQEJBDEWBBRA/yZL
mU3g84Zmq+msM6ia2fRaFzBSBgkqhkiG9w0BCQ8xRTBDMAoGCCqGSIb3DQMHMA4GCCqGSIb3
DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG9w0DAgIBKDCBhQYJKwYB
BAGCNxAEMXgwdjBiMQswCQYDVQQGEwJaQTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcg
KFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3Vpbmcg
Q0ECEALr5BE3U6n+HWCoLbyhohMwgYcGCyqGSIb3DQEJEAILMXigdjBiMQswCQYDVQQGEwJa
QTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhh
d3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0ECEALr5BE3U6n+HWCoLbyhohMwDQYJ
KoZIhvcNAQEBBQAEggEAmPb7O0nWiMLGWPjzrUCxnydyJDEXYO8ASW/xwh1gDL4W+1vxSil1
7rGxtlqT1VYEsqPe+ygo9k8CCPajmsaPC5ED+Ml2nDQSlNUrsmn28WhL6l+m1yztVNf7IcEE
CqMz8m033T89TsqtlI3/tpDf9ZTHYbzSrttAHsZ1sqUk86PmeQ4ee283B4GRFQB/oImRAseF
XR1iUst094d2+bwNOLpkfM3Hy/pgvOC46TCgIn8u3BY3bO0BiemkC/UvrJuuuteiMnfzW/5E
tNRtiaKJaPaahPkDu++Mpe56sv7HdxOTn0tzboFdUpEDULZtX5NvQIGMkVdp5qoBusdcmAjU
2gAAAAAAAA==
--------------ms020303030303000008060702--