[OpenAFS-devel] AFS vs UNICODE
Erik Dalén
dalen@socialisterna.org
Wed, 7 May 2008 03:04:25 +0200
On Wed, May 7, 2008 at 12:30 AM, <u+openafsdev-t07O@chalmers.se> wrote:
> Hello Jeffrey,
>
>
> On Tue, May 06, 2008 at 05:00:58PM -0400, Jeffrey Altman wrote:
>
> > (2) Since the directory lookups are performed using a hash table, a fi=
le
> > with the name being searched for might exist but it cannot be found
> > because the input to the hash function on client B is different than t=
he
> > input used to create the entry on client A.
>
> If the name is a byte sequence, this can not happen, you imply that
> the file name _is_ a character string.
> (Of course, applications do read user input as text - to create new file=
s,
> but most often not for opening existing files.)
> Compatibility in file naming (saved at one occation should be readable
> at another, possibly on another computer and by another program)
> belongs at the application level. File naming compatibility does not dif=
fer
> essentially from compatibility of file contents.
>
> Any file name works if you are not typing the name but reading it
> from the directory as bytes. On the other side, _any_ byte sequences,
> even "interpreted as text and normalized" will have problems to be prope=
rly
> displayed by programs in some locales. All the files nevertheless can st=
ay
> accessible as each one can be opened by its unique name read from
> the directory.
>
Macs won't under certain circumstances due to a bug in Mac OS.
>
> > Storing file names as opaque octet sequences is broken in other ways.
> > Depending on the character set used on the client the file name might =
or
> > might not be representable since the octet sequence contains no
> > indication whether the sequence is CP437, CP850, CP1252, ISO Latin-1,
> > ISO-Latin-9, UTF-7, UTF-8, etc.
>
> This is just the result of broken practices - using limited and thus
> incompatible encodings ultimately leads to breakage and no efforts
> can eliminate the pain afterwards.
>
> The most important, I think:
>
> Applying encodings to file names (treating them as text as opposite to
> byte sequences) is broken fundamentally - this can _not_ be done properl=
y.
>
Well, the bug is really in Mac OS X, the issue is if we should have a
workaround for it or not. Could file a bug with apple and with luck
they'll fix it in 10 years. Or we could normalize the file names in
the mac clients.
--
Erik Dalén