[OpenAFS] shadow volumes?
Jeffrey Hutzelman
jhutz@cmu.edu
Mon, 11 Jul 2005 21:46:02 -0400
On Sunday, July 10, 2005 05:47:36 PM -0400 Matt Benjamin
<matt@linuxbox.com> wrote:
> These are, sort of, new in OpenAFS.
Where by "sort of new" you mean "not really there at all". Your
descriptions of the low-level operations are on the mark, but I wanted to
provide some background on the as-yet-nonexistent high-level features that
they seem to imply -- and a couple of warnings, as well...
When I added the 'vos shadow' and 'vos clone' commands back in early 2004,
I had in mind a mechanism by which we would keep a fileserver containing
"shadow" copies of real volumes, updated on a regular basis, as a form of
backups. If a fileserver were to die a horrible death, we could resurrect
the volumes with loss of not more than, say, a day's worth of changes,
simply by pointing the VLDB entries for those volumes at the "shadow"
fileserver. The process of restoring a multi-terabyte fileserver would be
reduced to minutes rather than days.
I also had in mind a mechanism by which you could keep multiple online
"snapshots" of a volume, which would be visible to users in some fashion so
they could go back several days in time without requiring someone to do a
restore. Depending on the operational model, such snapshots might be on
the same server as the RW volume, or on the "shadow" fileserver.
Both of these features can be built with the tools I added, but the tools
alone are not sufficient. OpenAFS does not have these features, and does
not even pretend to have them. Before it can, they will require additional
design and implementation work:
The shadow-fileserver feature requires either VLDB changes or an external
database to keep track of the locations of the "shadow" volumes -- you
can't just publish them in the current VLDB, because for the feature to
work they have to be real RW volumes with the same ID's as the volumes from
which they were copied.
The multiple-snapshots feature requires VLDB changes to make it possible to
associate more volume ID's with each VLDB entry, and a way to derive the
names of snapshots from the names of the volumes involved. It may be more
complex than that, depending on what sorts of policies you want to support
for when snapshots are created and removed, and how they are named. Naming
is an extremely important issue here because in order for clients to find
such volumes, there need to be mount points for them somewhere in AFS, and
those become somewhat tricky to manage. Of course, you could build a
similar feature without changing the VLDB by using an external database and
registering the clones separately in the VLDB.
There are also additional problems related to the way various tools will
react to additional clones and to multiple copies of the same volume on
different servers.
As Matt alluded to, in the namei fileserver there is a limit of 7 volumes
in a volume group (an RW volume and all volumes cloned from it). In the
current system, a single volume group might contain as many as four volumes
- the RW itself, a backup volume, an RO or release clone if the volume is
replicated, and a move clone if the volume is being moved. That
essentially leaves room for 3 additional clones, or 4 if the volume in
question is not replicated.
I would not want to run syncvldb or syncserv against any fileserver
containing these constructs. Running a syncvldb against a shadow
fileserver would be disastrous -- it would update the VLDB to reflect all
volumes being on the shadow server instead of the real ones. I don't know
what would happen with multiple clones; I seem to remember doing some
experiments in this area but don't recall the results.
In short, these features do not really exist yet.
The 'vos clone' and 'vos shadow' commands are not intended to provide them;
they are low-level tools intended to perform specific functions which are
expected to be useful in building these or similar features, and which also
are sometimes useful in dealing with unusual situations.
Used alone, the 'vos clone' and 'vos shadow' commands will generally _not_
leave things in a consistent state. They can be dangerous; don't use them
unless you know what you're doing.
Now of course, if someone wants to talk about what it would take to make
these features a reality, I'd be happy to have such a conversation. But
that probably belongs over on openafs-devel, rather than here.
-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
Sr. Research Systems Programmer
School of Computer Science - Research Computing Facility
Carnegie Mellon University - Pittsburgh, PA