[OpenAFS-devel] AFS vs UNICODE

Wed, 7 May 2008 13:11:40 +0200

>  > The user types in a name via the user interface and the user interface
>  > determines how to represent that name not the user.  If the user enters
>  > the name on a MacOS X system she will get a UNICODE sequence that is in
>  > decomposed form.  If the user enters the same name on Windows she will
>  > get a UNICODE sequence that is in composed form.
>  >
>  > If the user tries to access her files from both machines she will have
>  > interop problems.
>
>  Not really, given that the file names are treated as byte sequences, she will be
>  able to open the file without any problems, just choose it from the list.

Please stop assuming that I will choose files from a list, or even
that I have a list to choose from. I often write filenames even when I
am using windows. I may even code a filename in a program. That file
may be created by a user on a different operating system.

You have to remember that AFS is not a local filesystem.

The fileserver should even be able to translate filenames between
different encodings as we have clients that don't know anything about
UTF-8. It is very hard to update all clients at the same time.

This works in SMB as the client can tell the server what encoding to use.

>  I guess the user may have harder times trying to use the contents of a file created by
>  some Windows application on Mac and vice versa :) It is on the application level where
>  compatibility must be addressed, and file naming is easy to address on that level.
>
>  Different application do not even have to agree on the exact encoding unless they
>  interchange the same data format, in which case they do have to have certain common
>  knowledge. The file system does not and can not have that knowledge.

It is extremely common for different applications to share data. We do
that all the time!
I will not put a lot of code in my java applications for filename
conversions, that is a job for the OS.

It is the system shat have to know how a file will be stored. This
goes for character encoding just as it does for the way files are
stored on the disk.

Please look at windows. Microsoft handles this problem, and it works
the way they do it.

NFS do not handle the problem, NFS ignores it  just as AFS currently
does. We are currently using NFS (we are testing AFS).
We have to set all UNIX clients to ISO 8859-1,  and we can't change to
UTF-8. But we have no problem with Windows clients that use UTF-8 as
samba will translate for us.