[OpenAFS] byte-level incremental backups using "full+reverse delta"
Adam Megacz
megacz@cs.berkeley.edu
Thu, 07 Feb 2008 10:21:53 -0800
In case anybody else finds this useful, I've worked out a system for
doing backups of an AFS volume-set with the "full and reverse delta"
style of incremental backups.
/afs/megacz.com/srv/bin/dump.sh
Much like rdiff-backup, this keeps a complete copy of the most recent
backup, and stores all previous backups as deltas against the
*following* backup (ie in the "reverse" direction). This means you
can "truncate" the backup history whenever you like, rather than only
at full-backup intervals, and there's never any reason to keep more
than a single "full" backup around.
Deltas are done with xdelta3. Although it is very CPU-intensive, it
seems to be very good at representing minor changes (ie a few bytes)
to very large files, which "incremental dumps" cannot do, and which
(in my experience) rdiff did not do as well as I would like. A "test
restore" of each delta is done before deleting the old volume, so you
don't need to trust that the xdelta3 algorithm is correct -- you just
have to trust it that it is deterministic.
Hope you find this useful,
- a
#!/bin/bash -e
# A script for "full and backward diff" style incremental backups of
# one or more cells' AFS dumpfiles using xdelta3 for diffing. Note
# that xdelta3 will efficiently represent minor changes to very large
# files, which AFS "incremental dumps" cannot do.
#
# IMPORTANT: you must use xdelta3 version SVN.227 or later -- this
# will become release 3.0u at some point. There is a precompiled x86
# deb at /afs/megacz.com/debian/xdelta3/xdelta3_svn227.deb
#
# Change these variables to suit your needs. Note:
# - Backups are kept in $BACKUPDIR/year/month/day/cell/vol.afsdump.
# - At all times the latest backup is kept in "full" form, and all
# previous backups are kept as "reverse diffs" against the backup
# from the day AFTER them. This lets you easily "truncate" the
# backup history at any time.
# - The symlink $BACKUPDIR/yesterday points to yesterday's backups, if
# there were any; upon completion of today's backups, yesterday's
# will be converted into reverse diffs.
BACKUPDIR=/vol/dumps
LOGFILE=/var/log/dump.log
exec 2>&1
exec &> LOGFILE
mkdir -p $BACKUPDIR
cd $BACKUPDIR
DIR=`date +%Y/%m/%d`
mkdir -p $DIR
rm -f today
ln -fs $DIR today
dump() {
CELL=$1
VOL=$2
OLD=yesterday/$CELL/$VOL.afsdump
NEW=today/$CELL/$VOL.afsdump
aklog -c $CELL
mkdir -p today/$CELL
echo "=============================================================================="
echo "dumping $VOL in cell $CELL..."
echo
tokens
vos examine $VOL -cell $CELL
echo
if test -e $OLD
then \
vos dump $VOL -file $NEW -clone -cell $CELL
xdelta3 -evfs $NEW $OLD $OLD.rev-xdelta || true
((xdelta3 -dcvfs $NEW $OLD.rev-xdelta | cmp - $OLD) && (echo 'removing...'; rm $OLD)) || true
else \
vos dump $VOL -file $NEW -clone -cell $CELL
fi
echo " done dumping $VOL."
}
# each command below is in the form "dump <cellname> <volumename>"
dump megacz.com pub
dump megacz.com srv
dump megacz.com web
dump megacz.com mail
dump megacz.com work
dump megacz.com user
dump megacz.com user.megacz
dump megacz.com root.cell
dump research.cs.berkeley.edu root.cell
# update the "yesterday" symlink so tomorrow's backups will deltify todays'
rm -f yesterday
mv today yesterday
##############################################################################
# old rdiff-based code; please ignore
##############################################################################
#
# rdiff delta $OLD.sig $NEW $OLD.fwd
# rdiff patch $OLD $OLD.fwd $NEW
# rm $OLD.fwd
#
# test -e $OLD.sig || rdiff signature $OLD $OLD.sig
# vos dump $VOL -clone -cell $CELL \
# | tee $NEW \
# | (rdiff -- signature - - \
# | tee $NEW.sig \
# | rdiff -- delta - $OLD - \
# | (tee $OLD.rev \
# | rdiff -- patch $NEW - - \
# | (cmp $OLD - && (echo "removing old dump..."; rm -f $OLD))))
# rm -f $OLD.sig