[OpenAFS-devel] AFS vs UNICODE
Jeffrey Altman
jaltman@secure-endpoints.com
Wed, 16 Jul 2008 15:38:53 -0400
With the release today of OpenAFS 1.5.50, the Microsoft Windows OpenAFS
client like the Microsoft Windows operating system on which it runs uses
the Unicode character set for all file system object names. Given the
discussion that took place on this list in early May of this year I want
to provide a follow up to describe how Unicode character set support has
been implemented without altering any of the AFS server processes.
The interface between the Windows SMB redirector and the AFS client
service is now purely Unicode (UCS2). Previously all of the file system
object names received through the SMB interface were converted by
Windows from Unicode into the local OEM character set. Any names that
cannot be translated resulted in an error.
The AFS Client will convert this UCS2 string to UTF-8 without
normalization and hand this string off to the file server.
When reading object names from the file server, the AFS Client will
attempt to treat the string as UTF-8. If the string is not valid UTF-8
the string will be interpreted as the OEM character set. If it is not
valid for the OEM character set, the name will be encoded so that it can
be delivered to Windows. The original file server string is maintained
along with the translated version with mapping performed as required.
All directory searches and file comparisons are performed by normalizing
the input from Windows and the strings from the file servers prior to
performing the comparison. However, non-normalized strings are always
delivered to the operating system.
Jeffrey Altman