[OpenAFS] fine-grained incrementals?

Jeffrey Hutzelman jhutz@cmu.edu
Thu, 24 Feb 2005 12:20:13 -0500


On Wednesday, February 23, 2005 05:13:52 PM -0800 Mike Fedyk 
<mfedyk@matchmail.com> wrote:

> Jeffrey Hutzelman wrote:
>
>> AFS does copy-on-write at the per-vnode layer.  Each vnode has
>> metadata which is kept in the volume's vnode indices; among other
>> things, this includes the identifier of the physical file which
>> contains the vnode's contents (for the inode fileserver, this is an
>> inode number; for namei it's a 64-bit "virtual inode number" which can
>> be used to derive the filename). The underlying inode has a link count
>> (in the filesystem for inode; in the link table for namei) which
>> reflects how many vnodes have references to that inode.  When you
>> write to a vnode whose underlying inode has more than one reference,
>> the fileserver allocates a new one for the vnode you're writing to,
>> and copies the contents.
>
> OK, I get it now.  An inode fileserver uses the link count on the
> underlying filesystem (ext3 for instance), and a namei server uses a
> large file (or possibly block device) with an AFS specific filesystem
> format.  Is that right?

Not quite.  Both inode and namei fileservers store their data in individual 
files on the local filesystem.  Each local file corresponds to the contents 
of one vnode (file, directory, or symlink) in the AFS filesystem, or to 
some particular kind of per-volume metadata (a volume header or vnode 
index).  The different between the two backends lies largely in how those 
files are located by the fileserver.

In an inode fileserver (the traditional model), the vnode index contains 
the inode numbers of the underlying files for each vnode; the inode numbers 
of the indices themselves are stored in the volume header (the Vxxx.vol 
files at the top level of each vice partition).  These inodes have no 
regular directory entries which point to them; they are manipulated via a 
set of special system calls provided by the AFS kernel module.  In this 
model, the link counts on the underlying inodes reflect the number of 
vnodes referring to that inode; when the link count is decremented to zero, 
the inode is automatically freed by the normal kernel filesystem code.

In a namei fileserver, the underlying files are normal files in the 
filesystem.  The vnode indices contain virtual "inode numbers" which are 
used to compute the file's actual filename; we then open the files by name. 
Since these are normal files on an unmodified local filesystem, their link 
counts in the underlying filesystem represent the number of actual links to 
them, which is always 1.  Information about how many vnodes are using that 
file is stored in the "link table", which is an additional per-volume 
metadata file.  This is the only backend currently available on Linux.

There is no fileserver backend which stores data in a large file or 
directly to a block device, and there never has been.  Such a thing would 
be possible, but it's not clear that it would be superior to the existing 
backends.

-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
   Sr. Research Systems Programmer
   School of Computer Science - Research Computing Facility
   Carnegie Mellon University - Pittsburgh, PA