[OpenAFS-devel] What is needed to build an AFS fileserver on top of BTRFS?

David Howells dhowells@redhat.com
Tue, 17 Dec 2013 20:27:39 +0000


Jeffrey Hutzelman <jhutz@cmu.edu> wrote:

> I'm a bit confused about whether you're talking about btrfs as a storage
> backend for, say, OpenAFS, or btrfs as the complete on-disk volume
> representation.  In particular, OpenAFS storage backends needn't provide
> storage for AFS-level vnode metadata, because that is stored in the
> vnode indices.  They really only need to provide storage for the few
> pieces of "key" data (volume/vnode/uniq) that bind a storage-layer inode
> to the corresponding AFS-layer vnode, plus the DV.

I was looking at the idea of storing everything the fileserver needs to
represent a particular AFS volume's contents in BTRFS so that it can be
snapshotted all in one go.

Now, my thought was that a fileserver could store one or more AFS volumes
within a single BTRFS filesystem.  Each AFS volume would be represented by a
BTRFS subvolume.

BTRFS snapshots could then be made of the BTRFS subvolumes in the process of
maintenance (eg. creating AFS volume dumps or cloning AFS volumes).  The
snapshots could be discarded when finished with.

Undumping an AFS dump would require the creation of an empty subvolume and
then filling in the contents.


As to storing AFS files as BTRFS files, I can think of a couple of ways of
arranging things, depending on what access characteristics you want - assuming
we have to work through the kernel POSIX file access interface.

There do exist system calls for opening by opaque filehandle, but I believe
that fabricating your own FH rather than using one you were given by one of
these syscalls is generally proscribed.


Questions on BTRFS:

 (1) Would BTRFS permit us to assign arbitrary inode numbers to files?  This
     might need a special file-creation syscall - but such might be useful for
     backup-restore tools also.

 (2) Are 'object IDs' the same as inode numbers?

 (3) Does each BTRFS subvolume have its own inode number namespace?  If not,
     we might be able to fabricate a 64-bit inode number from
     <volid>:<vnodeid> since those two values are both 32-bits.

David