[OpenAFS] fine-grained incrementals?
Jeffrey Hutzelman
jhutz@cmu.edu
Thu, 24 Feb 2005 12:20:13 -0500
On Wednesday, February 23, 2005 05:13:52 PM -0800 Mike Fedyk
<mfedyk@matchmail.com> wrote:
> Jeffrey Hutzelman wrote:
>
>> AFS does copy-on-write at the per-vnode layer. Each vnode has
>> metadata which is kept in the volume's vnode indices; among other
>> things, this includes the identifier of the physical file which
>> contains the vnode's contents (for the inode fileserver, this is an
>> inode number; for namei it's a 64-bit "virtual inode number" which can
>> be used to derive the filename). The underlying inode has a link count
>> (in the filesystem for inode; in the link table for namei) which
>> reflects how many vnodes have references to that inode. When you
>> write to a vnode whose underlying inode has more than one reference,
>> the fileserver allocates a new one for the vnode you're writing to,
>> and copies the contents.
>
> OK, I get it now. An inode fileserver uses the link count on the
> underlying filesystem (ext3 for instance), and a namei server uses a
> large file (or possibly block device) with an AFS specific filesystem
> format. Is that right?
Not quite. Both inode and namei fileservers store their data in individual
files on the local filesystem. Each local file corresponds to the contents
of one vnode (file, directory, or symlink) in the AFS filesystem, or to
some particular kind of per-volume metadata (a volume header or vnode
index). The different between the two backends lies largely in how those
files are located by the fileserver.
In an inode fileserver (the traditional model), the vnode index contains
the inode numbers of the underlying files for each vnode; the inode numbers
of the indices themselves are stored in the volume header (the Vxxx.vol
files at the top level of each vice partition). These inodes have no
regular directory entries which point to them; they are manipulated via a
set of special system calls provided by the AFS kernel module. In this
model, the link counts on the underlying inodes reflect the number of
vnodes referring to that inode; when the link count is decremented to zero,
the inode is automatically freed by the normal kernel filesystem code.
In a namei fileserver, the underlying files are normal files in the
filesystem. The vnode indices contain virtual "inode numbers" which are
used to compute the file's actual filename; we then open the files by name.
Since these are normal files on an unmodified local filesystem, their link
counts in the underlying filesystem represent the number of actual links to
them, which is always 1. Information about how many vnodes are using that
file is stored in the "link table", which is an additional per-volume
metadata file. This is the only backend currently available on Linux.
There is no fileserver backend which stores data in a large file or
directly to a block device, and there never has been. Such a thing would
be possible, but it's not clear that it would be superior to the existing
backends.
-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
Sr. Research Systems Programmer
School of Computer Science - Research Computing Facility
Carnegie Mellon University - Pittsburgh, PA