[OpenAFS-devel] Alternate file systems for disk cache

Marc Dionne marc.c.dionne@gmail.com
Thu, 21 Oct 2010 11:41:59 -0400


On Thu, Oct 21, 2010 at 9:30 AM, Charles M. Hannum <root@ihack.net> wrote:
> Following my bug report yesterday adding a check for JFS, I wanted to sup=
ply
> some additional information.
> The basic problem here is that the dcache code pulls out inode numbers an=
d
> then looks them up later. =A0In older versions of Linux, this was done wi=
th
> iget(). =A0In recent Linux 2.6 kernels, it's done by faking up a file han=
dle
> with type FILEID_INO32_GEN and using the file system's fh_to_dentry()
> function. =A0The limitation on file systems is now primarily which ones
> support FILEID_INO32_GEN and the generation=3D=3D0 hack.
> I've done a full audit of the file systems included in the Linux 2.6.35
> source tree, and found:
> 1) uses FILEID_INO32_GEN (should work):
> =A0=A0efs
> =A0=A0exofs
> =A0=A0ext2/3/4
> =A0=A0jffs2
> =A0=A0jfs
> =A0=A0ufs
> 2) uses FILEID_INO32_GEN (no generation=3D=3D0 hack, but trivial to add):
> =A0=A0ntfs
> =A0=A0xfs
> 3) uses custom file handle format:
> =A0=A0btrfs
> =A0=A0ceph
> =A0=A0fat
> =A0=A0fuse
> =A0=A0gfs2
> =A0=A0isofs
> =A0=A0ocfs2
> =A0=A0reiserfs
> =A0=A0udf
> It seems to me that making type 3 FSes work would be as =93simple=94 as m=
aking
> the AFS module use encode_fh() and store the file handle actually generat=
ed
> by the file system. =A0This would take slightly more memory, as we'd have=
 to
> store the type and length. =A0Even in the worst case (btrfs with
> connectable=3D=3Dtrue, which we don't have to use), the maximum file hand=
le size
> is 40 bytes, so figure 44 bytes extra per dcache file. =A0If we decide to=
 use
> connectable=3D=3Dfalse (ceph and fat ignore this, but keep their file han=
dles
> within the NFSv2 limit of 20 bytes anyway), then we only need 24 extra by=
tes
> per dcache file.
> More importantly, this will require quite a few changes throughout the AF=
S
> module code, because it likes to pass around inode numbers. =A0However, o=
ther
> systems could also use the change and not be dependent on a single file
> system type for AFS cache any more, so this has potentially widespread
> benefit.
> In any case, I think it would be beneficial to at least do a feature test=
 at
> startup time rather than encode specific file system types in afsd as is
> currently done. =A0I propose to do this by calling encode_fh(), checking =
that
> the return type is FILEID_INO32_GEN, setting the generation count to 0, a=
nd
> calling fh_to_dentry(). =A0If this does not work, we can punt with an err=
or.
> =A0This would enable all type 1 FSes to work immediately (which includes =
at
> least one non-integrated port of ZFS), and type 2 FSes to work if/when
> patches get integrated.
> Any thoughts?

I would suggest that you have a look at the code in the master branch,
for instance a recent 1.5 release.  It works pretty much as you
describe, using encode_fh to get a file handle for each cache file and
storing it instead of an inode number.  fh_to_dentry is used to later
open the cache files.  The type and length are stored but globally for
the whole cache; these have to be the same for all files in the cache,
and you will get an error if that's not the case.

The code in 1.4 was meant to be as non-intrusive as possible.  At this
point with 1.6 around the corner, I'm not sure we'd want to make such
a significant change for 1.4.

Marc