[OpenAFS-devel] Alternate file systems for disk cache

Charles M. Hannum root@ihack.net
Thu, 21 Oct 2010 09:30:10 -0400


--0016e6dd8be378f93a04932086d9
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Following my bug report yesterday adding a check for JFS, I wanted to suppl=
y
some additional information.

The basic problem here is that the dcache code pulls out inode numbers and
then looks them up later.  In older versions of Linux, this was done with
iget().  In recent Linux 2.6 kernels, it's done by faking up a file handle
with type FILEID_INO32_GEN and using the file system's fh_to_dentry()
function.  The limitation on file systems is now primarily which ones
support FILEID_INO32_GEN and the generation=3D=3D0 hack.

I've done a full audit of the file systems included in the Linux 2.6.35
source tree, and found:

1) uses FILEID_INO32_GEN (should work):
  efs
  exofs
  ext2/3/4
  jffs2
  jfs
  ufs

2) uses FILEID_INO32_GEN (no generation=3D=3D0 hack, but trivial to add):
  ntfs
  xfs

3) uses custom file handle format:
  btrfs
  ceph
  fat
  fuse
  gfs2
  isofs
  ocfs2
  reiserfs
  udf

It seems to me that making type 3 FSes work would be as =93simple=94 as mak=
ing
the AFS module use encode_fh() and store the file handle actually generated
by the file system.  This would take slightly more memory, as we'd have to
store the type and length.  Even in the worst case (btrfs with
connectable=3D=3Dtrue, which we don't have to use), the maximum file handle=
 size
is 40 bytes, so figure 44 bytes extra per dcache file.  If we decide to use
connectable=3D=3Dfalse (ceph and fat ignore this, but keep their file handl=
es
within the NFSv2 limit of 20 bytes anyway), then we only need 24 extra byte=
s
per dcache file.

More importantly, this will require quite a few changes throughout the AFS
module code, because it likes to pass around inode numbers.  However, other
systems could also use the change and not be dependent on a single file
system type for AFS cache any more, so this has potentially widespread
benefit.

In any case, I think it would be beneficial to at least do a feature test a=
t
startup time rather than encode specific file system types in afsd as is
currently done.  I propose to do this by calling encode_fh(), checking that
the return type is FILEID_INO32_GEN, setting the generation count to 0, and
calling fh_to_dentry().  If this does not work, we can punt with an error.
 This would enable all type 1 FSes to work immediately (which includes at
least one non-integrated port of ZFS), and type 2 FSes to work if/when
patches get integrated.

Any thoughts?

--0016e6dd8be378f93a04932086d9
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Following my bug report yesterday adding a check for JFS, I wanted to suppl=
y some additional information.<div><br></div><div>The basic problem here is=
 that the dcache code pulls out inode numbers and then looks them up later.=
 =A0In older versions of Linux, this was done with iget(). =A0In recent Lin=
ux 2.6 kernels, it&#39;s done by faking up a file handle with type FILEID_I=
NO32_GEN and using the file system&#39;s fh_to_dentry() function. =A0The li=
mitation on file systems is now primarily which ones support FILEID_INO32_G=
EN and the generation=3D=3D0 hack.</div>
<div><br></div><div>I&#39;ve done a full audit of the file systems included=
 in the Linux 2.6.35 source tree, and found:</div><div><br></div><div>1) us=
es FILEID_INO32_GEN (should work):</div><div>=A0=A0efs</div><div>=A0=A0exof=
s</div>
<div>=A0=A0ext2/3/4</div><div>=A0=A0jffs2</div><div>=A0=A0jfs</div><div>=A0=
=A0ufs</div><div><br></div><div>2) uses FILEID_INO32_GEN (no generation=3D=
=3D0 hack, but trivial to add):</div><div>=A0=A0ntfs</div><div>=A0=A0xfs</d=
iv><div><br></div><div>3) uses custom file handle format:</div>
<div>=A0=A0btrfs</div><div>=A0=A0ceph</div><div>=A0=A0fat</div><div>=A0=A0f=
use</div><div>=A0=A0gfs2</div><div>=A0=A0isofs</div><div>=A0=A0ocfs2</div><=
div>=A0=A0reiserfs</div><div>=A0=A0udf</div><div><br></div><div>It seems to=
 me that making type 3 FSes work would be as =93simple=94 as making the AFS=
 module use encode_fh() and store the file handle actually generated by the=
 file system. =A0This would take slightly more memory, as we&#39;d have to =
store the type and length. =A0Even in the worst case (btrfs with connectabl=
e=3D=3Dtrue, which we don&#39;t have to use), the maximum file handle size =
is 40 bytes, so figure 44 bytes extra per dcache file. =A0If we decide to u=
se connectable=3D=3Dfalse (ceph and fat ignore this, but keep their file ha=
ndles within the NFSv2 limit of 20 bytes anyway), then we only need 24 extr=
a bytes per dcache file.</div>
<div><br></div><div>More importantly, this will require quite a few changes=
 throughout the AFS module code, because it likes to pass around inode numb=
ers. =A0However, other systems could also use the change and not be depende=
nt on a single file system type for AFS cache any more, so this has potenti=
ally widespread benefit.</div>
<div><br></div><div>In any case, I think it would be beneficial to at least=
 do a feature test at startup time rather than encode specific file system =
types in afsd as is currently done. =A0I propose to do this by calling enco=
de_fh(), checking that the return type is FILEID_INO32_GEN, setting the gen=
eration count to 0, and calling fh_to_dentry(). =A0If this does not work, w=
e can punt with an error. =A0This would enable all type 1 FSes to work imme=
diately (which includes at least one non-integrated port of ZFS), and type =
2 FSes to work if/when patches get integrated.</div>
<div><br></div><div>Any thoughts?</div>

--0016e6dd8be378f93a04932086d9--