[OpenAFS-devel] AFS vs UNICODE

u+openafsdev-t07O@chalmers.se u+openafsdev-t07O@chalmers.se
Sat, 10 May 2008 14:24:23 +0200


Hi Erik,

On Sat, May 10, 2008 at 11:40:49AM +0200, Erik Dal=E9n wrote:
> In Mac OS X there is a "character set of the client", all applications
> use UTF-8-NFD. In Windows the "character set of the client" is now
> UTF-8-NFC.

Even having "a character set of the client" does not mean we can apply
it to file naming.

What does MacOS do when I, say, extract a tar archive?
Does it recode the file names to UTF-8-NFD?
Which encoding is it translating _from_ in that case?
What about Posixly valid file names not representing any text?

> I think it would be useful to at least have the option to for example
> translate between UTF-8-NFC which is used in the cell to ISO-8859-1

Note that as soon as we treat file names as text
we definitely break compatibility with operating systems
which do no impose such restriction.

As such systems are in fact present, the concept of "file names are text"
is fundamentally destructive for interoperability.

Workarounds as you mention may work and might be useful in some cases,
but I think the price of introduction of the concept itself
is a way too high.

So when you say "UTF-8-NFC which is used in the cell", this is already
incompatible with interoperability. A(ny) client should be capable
of creating a file with a name represented by an arbitrary string of byte=
s
besides '/' and '\0' so that any other client and process would be able
to read the same string. Otherwise we definitely break certain data and
certain applications.
All that for no good reason - as the best we get is workarounds for
some cases but no solution.

Regards,
Rune