[OpenAFS-devel] AFS vs UNICODE

Jeffrey Altman jaltman@secure-endpoints.com
Wed, 07 May 2008 07:24:31 -0400


u+openafsdev-t07O@chalmers.se wrote:
> You still did not answer, what happens if one application accepts user input
> as several non-ascii characters in latin-1 and passes it on to open(),
> and the other makes the same in another encoding?
> How can the file system guess what the two users (or even the same one working
> with two different applications and locales) mean and "fix" what the applications supply?

What happens is exactly what happens today.

UTF-8 is ISO 2022 compatible.  An ISO Latin-1 sequence is already a 
normalized string.

Interoperability between heterogeneous operating systems requires common 
interfaces.  AFS in particular requires commonality if we are ever going
to support internationalized cell names and volume names.

We will do so by adopting UNICODE as the character set and we will agree
on a standard UNICODE encoding and normalization.

Operating systems that do not support a standard locale will continue to 
treat file names as octet sequences but will provide a degraded user 
experience.

Operating systems that have UNICODE as the platform character set such 
as Windows and MacOS X will have a very good user experience.  The same
will be true for later versions of Linux and Solaris.  Just about 
everyone is moving towards UTF-8 based locales and modern day file 
systems assume UNICODE for file names.

Assuming that the file name is just a sequence of bytes works well
on a single standalone machine.  It does not provide a reasonable
experience for end users for network based protocols whether it be
a file system protocol, FTP, SSH, etc.

Jeffrey Altman