[OpenAFS-devel] AFS vs UNICODE

Mattias Pantzare pantzer@ludd.ltu.se
Fri, 9 May 2008 21:47:17 +0200


2008/5/9  <u+openafsdev-t07O@chalmers.se>:
> Hello Garrett,
>
> On Tue, May 06, 2008 at 02:31:03PM -0400, Garrett Wollman wrote:
>> <<On Tue, 06 May 2008 14:22:23 -0400, Jeffrey Altman <jaltman@secure-endpoints.com> said:
>>
>> > There are algorithms you can use to validate utf-8 sequences.
>>
>> The fact that an octet-sequence *can* be interpreted as UTF-8 does not
>> prove that it *is* UTF-8, or indeed any other representation of
>> Unicode characters.  The Windows (and presumably Mac OS X) environment
>
> Exactly.
>
>> may guarantee you (some representation of) Unicode characters, but no
>> such guarantee obtains in other client operating systems.  It may well
>> be reasonable to say "from this point forward, all filenames in AFS
>> shall be interpreted as normalized strings of Unicode characters in
>> UTF-8 representation", but that would be a subtle semantic change and
>
> Right.
>
> I just can not agree that the change would be "subtle". It would break
> a lot of existing data - for the purpose of following a design
> which is broken itself.
> File system drivers can not fully implement the stated change
> unless each process has its own encoding-aware view of the file system
> (not being the case even for MacOSX, if I am not totally mistaken).

You are forgetting that the AFS client can translate between Unicode
and the native charset on the client. If I run my Solaris client in
UTF-8 i somply configure my AFS client to give my applications UTF-8.
If I have ISO 8859-1, the client will translate between UTF-8 and
8859-1 for my applications.

Yes, the UNIX clients have to implement this.

The problem is old AFS clients that can't translate. The AFS protocol
need some way for clients to tell the server that they are UTF-8
ready, and the server should convert to a "default" charset for old
clients.

>
> (Shall we then expect all applications on all platforms to detect
> which file system they are accessing, or be rewritten to always ensure
> normalized UTF-8 encoding of the strings passed to open()?
> Assuming that the applications change in such way,
> then no change in AFS semantics will be needed at all (!)
> so what is its point to begin with?)

That is how it is today! Try using some UNIX systems configured for
8859-1 and some using UTF-8 on the same file system. (NFS or AFS)