[OpenAFS-devel] What is needed to build an AFS fileserver on top of BTRFS?

Garrett Wollman wollman@csail.mit.edu
Tue, 17 Dec 2013 14:37:12 -0500


<<On Tue, 17 Dec 2013 13:37:21 -0500, Jeffrey Hutzelman <jhutz@cmu.edu> said:

> On the other hand, if you're looking to provide the complete on-disk
> representation, then you need to be able to "name" inodes by
> (volume/vnode/uniq) instead of by a filename.  The fileserver needs to
> be able to specify those properties, instead of a name, when creating an
> inode (unless you're doing directory management), and it needs to be
> able to look up inodes by that same tuple, efficiently, even if you
> _are_ doing directory management.  Also bear in mind that the OpenAFS
> fileserver's current on-disk directory representation is also the
> on-the-wire representation, so even if you store directories in some
> other way, it must be possible to produce the AFS protocol version
> efficiently.

[...]

> It turns out to make integrity checking and recovery a lot easier if the
> volume/parent ID, vnode number, uniqifier, and data version are part of
> the filesystem metadata, rather than being stored in a separate file.

You'd really like for the OpenAFS fileserver to be able to talk
directly to something like the ZFS DMU and DSL, rather than having to
clunkily transform the AFS model into something that can squeeze
through the POSIX filesystem interface (ZPL, in the case of ZFS; I
assume btrfs has something similar).  One of the high-performance
cluster filesystems -- I forget which one, maybe Lustre? -- actually
does this.  The problem is that there's no official public interface
for exposing this across the user/kernel boundary, which makes it
nearly impossible for user-mode clients to take advantage of unless
they include most of the ZFS code directly.  But perhaps the
filesystem engineers could be persuaded to make such an interface, if
potential clients like OpenAFS were to agree on what it would need to
look like.  (At a minimum it can't hurt to ask.)  ZFS at least was
designed with this idea in mind, even if Sun never got around to
implementing it before Larry took over.

> We live with it today, because we have little choice with modern
> filesystems that don't give us any place to store metadata, but if a
> filesystem is going to be specifically designed to serve as a more
> efficient/reliable AFS storage backend, it should store this sort of
> thing in the filesystem metadata.

I don't think that any filesystem that doesn't give you a place to
store metadata really qualifies as "modern" these days.

-GAWollman