[OpenAFS-devel] AFS vs UNICODE

u+openafsdev-t07O@chalmers.se u+openafsdev-t07O@chalmers.se
Fri, 9 May 2008 23:27:20 +0200


Hello Mattias,

On Fri, May 09, 2008 at 09:47:17PM +0200, Mattias Pantzare wrote:
> > File system drivers can not fully implement the stated change
> > unless each process has its own encoding-aware view of the file system
> 
> You are forgetting that the AFS client can translate between Unicode
> and the native charset on the client. If I run my Solaris client in

Note that there is no "character set of the client", the character set
depends on the choice of each running process (in the first hand on
the locale used by the process).

> > (Shall we then expect all applications on all platforms to detect
> > which file system they are accessing, or be rewritten to always ensure
> > normalized UTF-8 encoding of the strings passed to open()?
> > Assuming that the applications change in such way,
> > then no change in AFS semantics will be needed at all (!)
> > so what is its point to begin with?)
> 
> That is how it is today! Try using some UNIX systems configured for
> 8859-1 and some using UTF-8 on the same file system. (NFS or AFS)

Hmm it seems you meant "is NOT today"?
Applications in *nix world do not assume that the file names are subject
to some certain text encoding.

Regarding your experiment proposal - I am happily using multiple Unix systems
configured by their administrators for different locales and encodings.
There is nothing that forces a user to live in the "default locale
of the computer".
I consequently run all my sessions in a UTF-8-based locale.
This happens to work as expected, which means just fine.
This also makes my globally accessible file names look good
on all Posix-compatible systems.

Accessing files created by applications in _different_ locales
is indeed a pain. You do not have to leave a single computer
to experience this, no NFS nor AFS are necessary, just run e.g.
 LANG=something.else xterm

I have to say it once more: this is not anything special for distributed file
systems. It's a general problem, like storing the contents of text files
without information about their encoding.

As long as different (instances of) applications consistently use the same
encoding - or the same format capable of indicating the encoding/locale -
it works properly, not otherwise.

File systems do not know the application instance's encoding of choice
so alas they can not do the work of transforming to a common one and back.

(note, there are also other things like "file-3.14159" vs "file-3,14159"
which still is the same problem of applications not prepared to the data
crossing locale borders)

Best regards,
Rune