[OpenAFS-devel] AFS vs UNICODE

Jeffrey Altman jaltman@secure-endpoints.com
Wed, 16 Jul 2008 15:38:53 -0400


With the release today of OpenAFS 1.5.50, the Microsoft Windows OpenAFS 
client like the Microsoft Windows operating system on which it runs uses 
the Unicode character set for all file system object names.  Given the 
discussion that took place on this list in early May of this year I want 
to provide a follow up to describe how Unicode character set support has 
been implemented without altering any of the AFS server processes.

The interface between the Windows SMB redirector and the AFS client 
service is now purely Unicode (UCS2).  Previously all of the file system 
object names received through the SMB interface were converted by 
Windows from Unicode into the local OEM character set.  Any names that 
cannot be translated resulted in an error.

The AFS Client will convert this UCS2 string to UTF-8 without 
normalization and hand this string off to the file server.

When reading object names from the file server, the AFS Client will 
attempt to treat the string as UTF-8.  If the string is not valid UTF-8 
the string will be interpreted as the OEM character set.  If it is not 
valid for the OEM character set, the name will be encoded so that it can 
be delivered to Windows.  The original file server string is maintained 
along with the translated version with mapping performed as required.

All directory searches and file comparisons are performed by normalizing 
the input from Windows and the strings from the file servers prior to 
performing the comparison.   However, non-normalized strings are always 
delivered to the operating system.

Jeffrey Altman