[OpenAFS-devel] AFS vs UNICODE

Garance A Drosihn drosih@rpi.edu
Mon, 12 May 2008 16:14:24 -0400


At 10:04 AM -0400 5/6/08, Jeffrey Altman wrote:
>
>Whatever we do we are going to have an interop problem on MacOS X 
>but since upgrading MacOS X clients is so much easier to do than 
>other platforms I will suggest that we bite the bullet there.
>
>Proposal:
>
>   1. MacOS X and Linux clients begin to apply NFC to all UTF-8 strings
>      obtained from the operating system whether for directory lookup,
>      object creation, or symlink target creation.
>   2. Implement NFC conversion in the Salvager.  This will apply to all
>      names in directories and will require that directory hash tables
>      be fixed when a name is changed to NFC.  It will also have to
>      apply to symlink targets.
>   3. In the File Server, apply NFC conversion to the names provided in
>      CreateFile, Link and Symlink RPCs as well as the targets in the
>      Link and Symlink RPCs.
>
>The real problem with this problem is that once the new file server 
>is deployed and the salvager is run against the volumes the existing 
>MacOS X clients will fail to be able to read any files in AFS.   If 
>anyone has an idea of how to address the Unicode normalization 
>problem going forward that doesn't result in an interop failure for 
>existing clients, please say something.

I was part of a big transition wrt character sets many years ago
(on an operating system far far away), and I appreciate this
transition is going to be a headache.  But it's also obvious that a
global resource (such as AFS) has to have some globally-consistent
definition for what a filename is.

I have a vague idea that this *might* be handled by having the AFS
server store an additional byte of info about names.  I'm not sure
if that byte should be per-filename, or per-directory.  (I think
there are problems with the idea either way).  That extra byte would
indicate if the stored filenames were NFC-normalized, or "just left
alone", or perhaps some other format.  The byte would either be for
a single filename, or add it to the directory-info to describe the
encoding for all filenames within that directory.

*New* AFS clients could explicitly say "I want to work with NFC-
normalized names", or for that matter, that they want the server to
continue to leave the filenames alone and not-normalize them.  Thus,
old clients would never send the request for NFC-normalized names,
and the server could make decisions based on that.

I realize this idea might cause more problems than it solves, but I
thought I'd toss it into the ring, and see what more-experienced
minds might think about it.

-- 
Garance Alistair Drosehn            =   gad@gilead.netel.rpi.edu
Senior Systems Programmer           or  gad@freebsd.org
Rensselaer Polytechnic Institute    or  drosih@rpi.edu