[OpenAFS] shadow volumes?

Jeffrey Hutzelman jhutz@cmu.edu
Mon, 11 Jul 2005 21:46:02 -0400

On Sunday, July 10, 2005 05:47:36 PM -0400 Matt Benjamin 
<matt@linuxbox.com> wrote:

> These are, sort of, new in OpenAFS.

Where by "sort of new" you mean "not really there at all".  Your 
descriptions of the low-level operations are on the mark, but I wanted to 
provide some background on the as-yet-nonexistent high-level features that 
they seem to imply -- and a couple of warnings, as well...

When I added the 'vos shadow' and 'vos clone' commands back in early 2004, 
I had in mind a mechanism by which we would keep a fileserver containing 
"shadow" copies of real volumes, updated on a regular basis, as a form of 
backups.  If a fileserver were to die a horrible death, we could resurrect 
the volumes with loss of not more than, say, a day's worth of changes, 
simply by pointing the VLDB entries for those volumes at the "shadow" 
fileserver.  The process of restoring a multi-terabyte fileserver would be 
reduced to minutes rather than days.

I also had in mind a mechanism by which you could keep multiple online 
"snapshots" of a volume, which would be visible to users in some fashion so 
they could go back several days in time without requiring someone to do a 
restore.  Depending on the operational model, such snapshots might be on 
the same server as the RW volume, or on the "shadow" fileserver.

Both of these features can be built with the tools I added, but the tools 
alone are not sufficient.  OpenAFS does not have these features, and does 
not even pretend to have them.  Before it can, they will require additional 
design and implementation work:

The shadow-fileserver feature requires either VLDB changes or an external 
database to keep track of the locations of the "shadow" volumes -- you 
can't just publish them in the current VLDB, because for the feature to 
work they have to be real RW volumes with the same ID's as the volumes from 
which they were copied.

The multiple-snapshots feature requires VLDB changes to make it possible to 
associate more volume ID's with each VLDB entry, and a way to derive the 
names of snapshots from the names of the volumes involved.  It may be more 
complex than that, depending on what sorts of policies you want to support 
for when snapshots are created and removed, and how they are named.  Naming 
is an extremely important issue here because in order for clients to find 
such volumes, there need to be mount points for them somewhere in AFS, and 
those become somewhat tricky to manage.  Of course, you could build a 
similar feature without changing the VLDB by using an external database and 
registering the clones separately in the VLDB.

There are also additional problems related to the way various tools will 
react to additional clones and to multiple copies of the same volume on 
different servers.

As Matt alluded to, in the namei fileserver there is a limit of 7 volumes 
in a volume group (an RW volume and all volumes cloned from it).  In the 
current system, a single volume group might contain as many as four volumes 
- the RW itself, a backup volume, an RO or release clone if the volume is 
replicated, and a move clone if the volume is being moved.  That 
essentially leaves room for 3 additional clones, or 4 if the volume in 
question is not replicated.

I would not want to run syncvldb or syncserv against any fileserver 
containing these constructs.  Running a syncvldb against a shadow 
fileserver would be disastrous -- it would update the VLDB to reflect all 
volumes being on the shadow server instead of the real ones.  I don't know 
what would happen with multiple clones; I seem to remember doing some 
experiments in this area but don't recall the results.

In short, these features do not really exist yet.
The 'vos clone' and 'vos shadow' commands are not intended to provide them; 
they are low-level tools intended to perform specific functions which are 
expected to be useful in building these or similar features, and which also 
are sometimes useful in dealing with unusual situations.

Used alone, the 'vos clone' and 'vos shadow' commands will generally _not_ 
leave things in a consistent state.  They can be dangerous; don't use them 
unless you know what you're doing.

Now of course, if someone wants to talk about what it would take to make 
these features a reality, I'd be happy to have such a conversation.  But 
that probably belongs over on openafs-devel, rather than here.

-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
   Sr. Research Systems Programmer
   School of Computer Science - Research Computing Facility
   Carnegie Mellon University - Pittsburgh, PA