[OpenAFS-devel] AFS vs UNICODE

Fri, 9 May 2008 19:18:06 +0200

Hello Garrett,

On Tue, May 06, 2008 at 02:31:03PM -0400, Garrett Wollman wrote:
> <<On Tue, 06 May 2008 14:22:23 -0400, Jeffrey Altman <jaltman@secure-endpoints.com> said:
> 
> > There are algorithms you can use to validate utf-8 sequences.
> 
> The fact that an octet-sequence *can* be interpreted as UTF-8 does not
> prove that it *is* UTF-8, or indeed any other representation of
> Unicode characters.  The Windows (and presumably Mac OS X) environment

Exactly.

> may guarantee you (some representation of) Unicode characters, but no
> such guarantee obtains in other client operating systems.  It may well
> be reasonable to say "from this point forward, all filenames in AFS
> shall be interpreted as normalized strings of Unicode characters in
> UTF-8 representation", but that would be a subtle semantic change and

Right.

I just can not agree that the change would be "subtle". It would break
a lot of existing data - for the purpose of following a design
which is broken itself.
File system drivers can not fully implement the stated change
unless each process has its own encoding-aware view of the file system
(not being the case even for MacOSX, if I am not totally mistaken).

(Shall we then expect all applications on all platforms to detect
which file system they are accessing, or be rewritten to always ensure
normalized UTF-8 encoding of the strings passed to open()?
Assuming that the applications change in such way,
then no change in AFS semantics will be needed at all (!)
so what is its point to begin with?)

So such semantic change would presumably make OpenAFS more friendly
to MacOS <-> Windows interoperability and help to work around some internal
MacOS problem -
  but it would be hostile against most other systems.

Best regards,
Rune