[OpenAFS] Understanding questions backup volume

Jeffrey Hutzelman jhutz@cmu.edu
Thu, 09 Feb 2006 18:53:48 -0500


There seems to have been some confusion in this thread, so I guess I
will speak up...

On Thursday, February 09, 2006 11:43:45 AM +0100 Lars Schimmer 
<l.schimmer@cgv.tugraz.at> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi!
>
> I start using backup volumes ;-)
> It is fairly easy to create one and mount them.
> But: Where is the difference between RO copies and a backup volume?

For a moment, let's limit ourselves to an RO volume on the same site as the 
RW.  In that case, as far as the fileserver is concerned, there is almost 
no difference.  Both RO and backup volumes are efficient copy-on-write 
clones of a read-write parent volume.  Neither is stored as a "diff" with 
respect to the other; rather, both volumes have complete, independent vnode 
indexes, but they share storage for files that have no changed since the 
clone was updated.

When a shared file is modified in the parent volume, a copy is made, so 
that the parent volume can be updated with the new contents while the 
clones retain the original contents.  If there are multiple clones, they 
continue to share storage for that file.  Note that the clones themselves 
are always "read-only" in the sense that they are not writable; only the RW 
parent volume can be modified.

The fileserver does treat BK volumes specially in a couple of ways. 
Notably, it keeps track of which volume is "the" backup volume for a given 
RW parent, if one is present at all, and it keeps track in the parent's 
volume header of the last time such a volume was cloned.  No similar 
tracking is done for RO volumes or other clones (in fact, to the fileserver 
there is _no_ difference between "real" RO volumes and those created with 
the 'vos clone' command).


> I know, backup volumes should be used for backup, RO for distributing
> data all over the cell.

Well, yes; those are the uses for which these types were intended.  Note 
that there is a variety of special handling in clients related to these 
volume types.  For example, volume names ending in .backup or .readonly are 
looked up in the VLDB under the parent name, but are considered to refer to 
the RO or BK volume.  The cache manager has the concept of an "RO" path; a 
normal (#) mount point in an RO volume resolves to an RO volume, if there 
is one.  The VLDB records a separate set of sites for RO clones, while the 
BK volume is presumed to live in the same place as the RW (since it must).

> A backup should be made of the backup volumes, because this doesn't lock
> the RW volumes for a long time.

Backup volumes can be used for this purpose, and indeed that was part of 
the reason for having them, but as was pointed out, any clone will do -- 
you can even have 'vos dump' create a temporary one on the fly for you. 
However, another part of the original intent was that backup volumes would 
be cloned once per day (there is even a 'vos backupsys' command for this 
purpose), and mounted where users could find them.  So, if a user 
accidentally deletes a file, he can retrieve it from the backup clone 
without bothering someone to do a restore.


> And if I vos dump the backup volumes to a backup server (amanda-afs or
> just plain dump) I could rebuild the backup volumes. Does this help me
> in case of a lost RW volume?

Well, if you 'vos dump' the backup volume, you can restore an RW volume 
from that dump.  Or, you can restore an RO copy with a different name, with 
'vos restore -readonly'.  That's another thing RO volumes are good for -- 
you can have standalone RO volumes with no RW parent, either as a result of 
a readonly restore, or as a result of replication.


> At least a RO copy could be converted to a RW volume in nearly NO time,
> but a backup volume?

I've not looked at the code in a lot of detail, so I don't know whether it 
will work to "convert" a clone (RO or BK) which is colocated with a parent. 
The convertROtoRW operation is designed to let you "promote" an RO volume 
that lives on a different partition from the RW, in the event the partition 
containing the RW fails.  Since RO and BK volumes living on the same 
partition as the RW share storage with it, it is unlikely that they will 
survive intact if something destroys the RW parent.



> Our cell is designed to have a RO copy of every RW volume.
> And if one RO copy of a RW volume resist on a file server housed in a
> datacenter "far away" I've got a quick and easy 1-day-backup in case of
> big error here. With the ROtoRW convert the cell is back up very fast.
> So why use backup volumes?

I think I answered that above.  You're doing something very nonstandard - 
trying to use replication to provide failure recovery for volumes that are 
RW by nature.  The replication feature was designed to provide reliable, 
scalable storage for data which is accessed by many clients and changed 
infrequently.  In the Andrew system, it was originally used for 
distributing system software, and that is still its primary use today.



> Are backup volumes built incremental?

Backup and RO clones are built in the same fashion, by making a copy of the 
parent volume's vnode index and incrementing the refcounts on all of the 
files in the volume.  When an existing backup clone is recloned, the 
refcounts on the files that were present in the previous volume are 
decremented, and any that are no longer referenced are freed.  No data is 
copied in any event.


> Because with only RO copies, I get a 1-day-backup, but I need a
> long-term-backup with incremental backups.

Keeping such backups in the form of online volumes is not terribly 
efficient.  Long term backups should be kept in the form of volume dumps, 
possibly compressed, and stored on disk and/or tape.  There are several 
options available for managing backups; you can use 'vos dump' or the 
backup system included with OpenAFS, and there are a number of third-party 
packages, both open-source and commercial, which offer AFS backup support. 
All of these approaches are capable of making use of incremental volume 
dumps.

-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
   Sr. Research Systems Programmer
   School of Computer Science - Research Computing Facility
   Carnegie Mellon University - Pittsburgh, PA