[OpenAFS-devel] AFS vs UNICODE
Garance A Drosihn
drosih@rpi.edu
Mon, 12 May 2008 16:14:24 -0400
At 10:04 AM -0400 5/6/08, Jeffrey Altman wrote:
>
>Whatever we do we are going to have an interop problem on MacOS X
>but since upgrading MacOS X clients is so much easier to do than
>other platforms I will suggest that we bite the bullet there.
>
>Proposal:
>
> 1. MacOS X and Linux clients begin to apply NFC to all UTF-8 strings
> obtained from the operating system whether for directory lookup,
> object creation, or symlink target creation.
> 2. Implement NFC conversion in the Salvager. This will apply to all
> names in directories and will require that directory hash tables
> be fixed when a name is changed to NFC. It will also have to
> apply to symlink targets.
> 3. In the File Server, apply NFC conversion to the names provided in
> CreateFile, Link and Symlink RPCs as well as the targets in the
> Link and Symlink RPCs.
>
>The real problem with this problem is that once the new file server
>is deployed and the salvager is run against the volumes the existing
>MacOS X clients will fail to be able to read any files in AFS. If
>anyone has an idea of how to address the Unicode normalization
>problem going forward that doesn't result in an interop failure for
>existing clients, please say something.
I was part of a big transition wrt character sets many years ago
(on an operating system far far away), and I appreciate this
transition is going to be a headache. But it's also obvious that a
global resource (such as AFS) has to have some globally-consistent
definition for what a filename is.
I have a vague idea that this *might* be handled by having the AFS
server store an additional byte of info about names. I'm not sure
if that byte should be per-filename, or per-directory. (I think
there are problems with the idea either way). That extra byte would
indicate if the stored filenames were NFC-normalized, or "just left
alone", or perhaps some other format. The byte would either be for
a single filename, or add it to the directory-info to describe the
encoding for all filenames within that directory.
*New* AFS clients could explicitly say "I want to work with NFC-
normalized names", or for that matter, that they want the server to
continue to leave the filenames alone and not-normalize them. Thus,
old clients would never send the request for NFC-normalized names,
and the server could make decisions based on that.
I realize this idea might cause more problems than it solves, but I
thought I'd toss it into the ring, and see what more-experienced
minds might think about it.
--
Garance Alistair Drosehn = gad@gilead.netel.rpi.edu
Senior Systems Programmer or gad@freebsd.org
Rensselaer Polytechnic Institute or drosih@rpi.edu