[OpenAFS-devel] AFS vs UNICODE

Jeffrey Altman jaltman@secure-endpoints.com
Wed, 07 May 2008 05:22:07 -0400


Roland Kuhn wrote:

> How do you know you're dealing with Unicode in the first place? Imagine 
> a latin1 file name which incidentally does not violate the UTF-8 rules, 
> but happens to be not normalized. Normalizing it will simply destroy it.

This is incorrect.  NFC does not alter ISO-Latin-1 sequences.  Any names
constructed using ISO 2022 character sets will not contain C1 controls
that are used by UTF-8 to indicate multibyte sequences.

Where there would be a problem is with Microsoft ANSI code pages which
put printable characters in the C1 range but on Windows we are going to
have much broader problems due to the conversion from Code Pages to
UNICODE which will make all previously written files inaccessible.