[OpenAFS-devel] Some observations when running OpenAFS on Mac OS 10.5

Martin Viklund martin.viklund@it.su.se
Tue, 27 Jan 2009 01:42:14 +0100


Hello!

I know this topic has been up on the list before, its regarding filename-en=
codings on Leopard.

Here is a tar-archive of a quick test i made the other day:
http://people.su.se/~viklund/mac/leopard-filename-encoding-test.tar.gz

The test.sh script inside that tar-archive looks like this:

#!/bin/bash

# extract filenames
echo "------------------------------------------------------------------"
echo "These are the files that we will be looking at"
echo "------------------------------------------------------------------"
for type in latin1 NFC NFD
do=20
    find . -iname \*${type}.html -maxdepth 1 -exec sh -c '
	test -e refs.'$type' ||
	(
		# create/reset refs-file
		>  refs.'$type'
		# fill refs-file
		( sed s/'$type'/latin1/ <<<{}=20
		  sed s/'$type'/NFC/ <<<{}=20
		  sed s/'$type'/NFD/ <<<{}=20
		) >> refs.'$type'
	)=09
	xxd -s2 <<<{}=20
	' \;
done
echo "------------------------------------------------------------------"
# perform some test by using different charsets/utf-nf's filenames
for type in latin1 NFC NFD
do
	echo "Testing cmd behaviour by using _${type}_ as filename argument"
	echo "------------------------------------------------------------------"
	while read fn
	do
		xxd <<<"$fn"
		for cmd in cat open
		do
			case $cmd in
				cat) grepstr=3D"No such file" ;;
				open) grepstr=3D"does not exist" ;;=09
			esac
			( $cmd "$fn" 2>&1 |=20
			  grep -q "$grepstr" &&=20
			  echo "Fail!" ||=20
			  echo "OK"=20
			) | sed "s/^/$cmd =3D> /"
		done
	done < refs.$type
	echo "------------------------------------------------------------------"
done


What it does is to check if a particular file will be opened correctly by c=
at and open=20
when we are passing a string (filename) with different encoding-types as ar=
gument.

The encoding-types inlcuded in the test are Latin1/ISO-8859-1 and UNICODE/U=
TF-8 NFC & NFD.
NFD -> Normalization Form Decomposed=20
NFC -> Normalization Form Composed
(check wikipedia for more information if needed)

If you unpack the archive in AFS-space and run test.sh you will get this ou=
tput:

------------------------------------------------------------------
These are the files that we will be looking at
------------------------------------------------------------------
0000002: e5e4 f62d 6c61 7469 6e31 2e68 746d 6c0a  ...-latin1.html.
0000002: c3a5 c3a4 c3b6 2d4e 4643 2e68 746d 6c0a  ......-NFC.html.
0000002: 61cc 8a61 cc88 6fcc 882d 4e46 442e 6874  a..a..o..-NFD.ht
0000012: 6d6c 0a                                  ml.
------------------------------------------------------------------
Testing cmd behaviour by using _latin1_ as filename argument
------------------------------------------------------------------
0000000: 2e2f e5e4 f62d 6c61 7469 6e31 2e68 746d  ./...-latin1.htm
0000010: 6c0a                                     l.
cat =3D> OK
open =3D> Fail!
0000000: 2e2f e5e4 f62d 4e46 432e 6874 6d6c 0a    ./...-NFC.html.
cat =3D> Fail!
open =3D> Fail!
0000000: 2e2f e5e4 f62d 4e46 442e 6874 6d6c 0a    ./...-NFD.html.
cat =3D> Fail!
open =3D> Fail!
------------------------------------------------------------------
Testing cmd behaviour by using _NFC_ as filename argument
------------------------------------------------------------------
0000000: 2e2f c3a5 c3a4 c3b6 2d6c 6174 696e 312e  ./......-latin1.
0000010: 6874 6d6c 0a                             html.
cat =3D> Fail!
open =3D> Fail!
0000000: 2e2f c3a5 c3a4 c3b6 2d4e 4643 2e68 746d  ./......-NFC.htm
0000010: 6c0a                                     l.
cat =3D> OK
open =3D> Fail!
0000000: 2e2f c3a5 c3a4 c3b6 2d4e 4644 2e68 746d  ./......-NFD.htm
0000010: 6c0a                                     l.
cat =3D> Fail!
open =3D> OK
------------------------------------------------------------------
Testing cmd behaviour by using _NFD_ as filename argument
------------------------------------------------------------------
0000000: 2e2f 61cc 8a61 cc88 6fcc 882d 6c61 7469  ./a..a..o..-lati
0000010: 6e31 2e68 746d 6c0a                      n1.html.
cat =3D> Fail!
open =3D> Fail!
0000000: 2e2f 61cc 8a61 cc88 6fcc 882d 4e46 432e  ./a..a..o..-NFC.
0000010: 6874 6d6c 0a                             html.
cat =3D> Fail!
open =3D> Fail!
0000000: 2e2f 61cc 8a61 cc88 6fcc 882d 4e46 442e  ./a..a..o..-NFD.
0000010: 6874 6d6c 0a                             html.
cat =3D> OK
open =3D> OK
------------------------------------------------------------------


If you unpack it locally on a HFS+ partition you will get this output:

(Notice that the unpacked files, except for the decomposed one, get=20
their filename changed by the OS.)
------------------------------------------------------------------
These are the files that we will be looking at
------------------------------------------------------------------
0000002: 2545 3525 4534 2546 362d 6c61 7469 6e31  %E5%E4%F6-latin1
0000012: 2e68 746d 6c0a                           .html.
0000002: 61cc 8a61 cc88 6fcc 882d 4e46 432e 6874  a..a..o..-NFC.ht
0000012: 6d6c 0a                                  ml.
0000002: 61cc 8a61 cc88 6fcc 882d 4e46 442e 6874  a..a..o..-NFD.ht
0000012: 6d6c 0a                                  ml.
------------------------------------------------------------------
Testing cmd behaviour by using _latin1_ as filename argument
------------------------------------------------------------------
0000000: 2e2f e5e4 f62d 6c61 7469 6e31 2e68 746d  ./...-latin1.htm
0000010: 6c0a                                     l.
cat =3D> OK
open =3D> Fail!
0000000: 2e2f e5e4 f62d 4e46 432e 6874 6d6c 0a    ./...-NFC.html.
cat =3D> Fail!
open =3D> Fail!
0000000: 2e2f e5e4 f62d 4e46 442e 6874 6d6c 0a    ./...-NFD.html.
cat =3D> Fail!
open =3D> Fail!
------------------------------------------------------------------
Testing cmd behaviour by using _NFC_ as filename argument
------------------------------------------------------------------
0000000: 2e2f c3a5 c3a4 c3b6 2d6c 6174 696e 312e  ./......-latin1.
0000010: 6874 6d6c 0a                             html.
cat =3D> Fail!
open =3D> Fail!
0000000: 2e2f c3a5 c3a4 c3b6 2d4e 4643 2e68 746d  ./......-NFC.htm
0000010: 6c0a                                     l.
cat =3D> OK
open =3D> OK
0000000: 2e2f c3a5 c3a4 c3b6 2d4e 4644 2e68 746d  ./......-NFD.htm
0000010: 6c0a                                     l.
cat =3D> OK
open =3D> OK
------------------------------------------------------------------
Testing cmd behaviour by using _NFD_ as filename argument
------------------------------------------------------------------
0000000: 2e2f 61cc 8a61 cc88 6fcc 882d 6c61 7469  ./a..a..o..-lati
0000010: 6e31 2e68 746d 6c0a                      n1.html.
cat =3D> Fail!
open =3D> Fail!
0000000: 2e2f 61cc 8a61 cc88 6fcc 882d 4e46 432e  ./a..a..o..-NFC.
0000010: 6874 6d6c 0a                             html.
cat =3D> OK
open =3D> OK
0000000: 2e2f 61cc 8a61 cc88 6fcc 882d 4e46 442e  ./a..a..o..-NFD.
0000010: 6874 6d6c 0a                             html.
cat =3D> OK
open =3D> OK
------------------------------------------------------------------

Notice also that you can both cat and open the file by using=20
a unicode-utf8-nfc argument (even though the actual file is
unicode-utf8-nfd) when the file is on the local filsystem (HFS+).

The support for Latin1/ISO-8859-1 charset seems nonexistant when the=20
file was in AFS (with the exception for some unix-binaries, Finder cant
even see the file).=20
When they get copied to the local HD they get a new name (with for=20
example, %E5 as the Swedish letter '=E5'). Also, when unpacked or copied
to the local HD i can see the file in the Finder and also open it, but
with the drawback that one or more %XX will be in the filename instead.

We are using AFS for homedirectories and other areas share by many people
so this is a big problem for us right now. (at least three frequently used=
=20
characters gets encoded, =E5=E4=F6 that is)

Any help greatly appreciated!

Best regards,
Martin Viklund=