[OpenAFS-devel] AFS vs UNICODE

Erik Dalén dalen@socialisterna.org
Wed, 7 May 2008 03:04:25 +0200


On Wed, May 7, 2008 at 12:30 AM,  <u+openafsdev-t07O@chalmers.se> wrote:
> Hello Jeffrey,
>
>
>  On Tue, May 06, 2008 at 05:00:58PM -0400, Jeffrey Altman wrote:
>
>  > (2) Since the directory lookups are performed using a hash table, a fi=
le
>  > with the name being searched for might exist but it cannot be found
>  > because the input to the hash function on client B is different than t=
he
>  > input used to create the entry on client A.
>
>  If the name is a byte sequence, this can not happen, you imply that
>  the file name _is_ a character string.
>  (Of course, applications do read user input as text - to create new file=
s,
>  but most often not for opening existing files.)
>  Compatibility in file naming (saved at one occation should be readable
>  at another, possibly on another computer and by another program)
>  belongs at the application level. File naming compatibility does not dif=
fer
>  essentially from compatibility of file contents.
>
>  Any file name works if you are not typing the name but reading it
>  from the directory as bytes. On the other side, _any_ byte sequences,
>  even "interpreted as text and normalized" will have problems to be prope=
rly
>  displayed by programs in some locales. All the files nevertheless can st=
ay
>  accessible as each one can be opened by its unique name read from
>  the directory.
>

Macs won't under certain circumstances due to a bug in Mac OS.

>
>  > Storing file names as opaque octet sequences is broken in other ways.
>  > Depending on the character set used on the client the file name might =
or
>  > might not be representable since the octet sequence contains no
>  > indication whether the sequence is CP437, CP850, CP1252, ISO Latin-1,
>  > ISO-Latin-9, UTF-7, UTF-8, etc.
>
>  This is just the result of broken practices - using limited and thus
>  incompatible encodings ultimately leads to breakage and no efforts
>  can eliminate the pain afterwards.
>
>  The most important, I think:
>
>  Applying encodings to file names (treating them as text as opposite to
>  byte sequences) is broken fundamentally - this can _not_ be done properl=
y.
>

Well, the bug is really in Mac OS X, the issue is if we should have a
workaround for it or not. Could file a bug with apple and with luck
they'll fix it in 10 years. Or we could normalize the file names in
the mac clients.

-- 
Erik Dalén