[OpenAFS-devel] Alternate file systems for disk cache
Derrick Brashear
shadow@gmail.com
Thu, 21 Oct 2010 11:03:56 -0400
Suddenly, reading this, I fear I may have led you down the garden
path. Apologies. Read on.
On Thu, Oct 21, 2010 at 9:30 AM, Charles M. Hannum <root@ihack.net> wrote:
> Following my bug report yesterday adding a check for JFS, I wanted to sup=
ply
> some additional information.
> The basic problem here is that the dcache code pulls out inode numbers an=
d
> then looks them up later. =A0In older versions of Linux, this was done wi=
th
> iget(). =A0In recent Linux 2.6 kernels, it's done by faking up a file han=
dle
> with type FILEID_INO32_GEN and using the file system's fh_to_dentry()
> function. =A0The limitation on file systems is now primarily which ones
> support FILEID_INO32_GEN and the generation=3D=3D0 hack.
> I've done a full audit of the file systems included in the Linux 2.6.35
> source tree, and found:
> 1) uses FILEID_INO32_GEN (should work):
> =A0=A0efs
> =A0=A0exofs
> =A0=A0ext2/3/4
> =A0=A0jffs2
> =A0=A0jfs
> =A0=A0ufs
> 2) uses FILEID_INO32_GEN (no generation=3D=3D0 hack, but trivial to add):
> =A0=A0ntfs
> =A0=A0xfs
> 3) uses custom file handle format:
> =A0=A0btrfs
> =A0=A0ceph
> =A0=A0fat
> =A0=A0fuse
> =A0=A0gfs2
> =A0=A0isofs
> =A0=A0ocfs2
> =A0=A0reiserfs
> =A0=A0udf
> It seems to me that making type 3 FSes work would be as =93simple=94 as m=
aking
> the AFS module use encode_fh() and store the file handle actually generat=
ed
> by the file system. =A0This would take slightly more memory, as we'd have=
to
> store the type and length. =A0Even in the worst case (btrfs with
> connectable=3D=3Dtrue, which we don't have to use), the maximum file hand=
le size
> is 40 bytes, so figure 44 bytes extra per dcache file. =A0If we decide to=
use
> connectable=3D=3Dfalse (ceph and fat ignore this, but keep their file han=
dles
> within the NFSv2 limit of 20 bytes anyway), then we only need 24 extra by=
tes
> per dcache file.
> More importantly, this will require quite a few changes throughout the AF=
S
> module code, because it likes to pass around inode numbers.
It shouldn't, at least in 1.5.current and head: it passes around a
struct which as it happens contains... what I think you suggest?
#define MAX_FH_LEN 10
typedef union {
#if defined(NEW_EXPORT_OPS)
struct fid fh;
#endif
__u32 raw[MAX_FH_LEN];
} afs_ufs_dcache_id_t;
So... looking at the JFS patch in RT, I suddenly got it: you were
looking at 1.4.x.
> =A0However, other
> systems could also use the change and not be dependent on a single file
> system type for AFS cache any more, so this has potentially widespread
> benefit.
> In any case, I think it would be beneficial to at least do a feature test=
at
> startup time rather than encode specific file system types in afsd as is
> currently done. =A0I propose to do this by calling encode_fh(), checking =
that
> the return type is FILEID_INO32_GEN, setting the generation count to 0, a=
nd
> calling fh_to_dentry(). =A0If this does not work, we can punt with an err=
or.
> =A0This would enable all type 1 FSes to work immediately (which includes =
at
> least one non-integrated port of ZFS), and type 2 FSes to work if/when
> patches get integrated.
> Any thoughts?
Some of this may still need to be done for things to work more
properly, since currently we don't properly tell what you call type 2
that they've lost.
What we do:
static inline int
afs_get_fh_from_dentry(struct dentry *dp, afs_ufs_dcache_id_t *ainode, int =
*max_
lenp) {
if (dp->d_sb->s_export_op->encode_fh)
return dp->d_sb->s_export_op->encode_fh(dp, &ainode->raw[0],
max_lenp, 0);
#if defined(NEW_EXPORT_OPS)
/* If fs doesn't provide an encode_fh method, assume the default
INO32 type */
*max_lenp =3D sizeof(struct fid)/4;
ainode->fh.i32.ino =3D dp->d_inode->i_ino;
ainode->fh.i32.gen =3D dp->d_inode->i_generation;
return FILEID_INO32_GEN;
#else
/* or call the default encoding function for the old API */
return export_op_default.encode_fh(dp, &ainode->raw[0], max_lenp, 0);
#endif
}
--=20
Derrick