[OpenAFS-devel] AFS vs UNICODE

Garrett Wollman wollman@csail.mit.edu
Tue, 6 May 2008 14:31:03 -0400


<<On Tue, 06 May 2008 14:22:23 -0400, Jeffrey Altman <jaltman@secure-endpoints.com> said:

>> [I wrote:]
>> How do they know it's a UTF-8 string?  Traditional Unix semantics
>> provide that a file name is a byte sequence, not a character
>> sequence.

> There are algorithms you can use to validate utf-8 sequences.

The fact that an octet-sequence *can* be interpreted as UTF-8 does not
prove that it *is* UTF-8, or indeed any other representation of
Unicode characters.  The Windows (and presumably Mac OS X) environment
may guarantee you (some representation of) Unicode characters, but no
such guarantee obtains in other client operating systems.  It may well
be reasonable to say "from this point forward, all filenames in AFS
shall be interpreted as normalized strings of Unicode characters in
UTF-8 representation", but that would be a subtle semantic change and
had better be well-documented as yet another way AFS differs from
traditional Unix filesystem semantics.

-GAWollman