[OpenAFS] byte-level incremental backups using "full+reverse delta"

Adam Megacz megacz@cs.berkeley.edu
Thu, 07 Feb 2008 10:21:53 -0800


In case anybody else finds this useful, I've worked out a system for
doing backups of an AFS volume-set with the "full and reverse delta"
style of incremental backups.

  /afs/megacz.com/srv/bin/dump.sh

Much like rdiff-backup, this keeps a complete copy of the most recent
backup, and stores all previous backups as deltas against the
*following* backup (ie in the "reverse" direction).  This means you
can "truncate" the backup history whenever you like, rather than only
at full-backup intervals, and there's never any reason to keep more
than a single "full" backup around.

Deltas are done with xdelta3.  Although it is very CPU-intensive, it
seems to be very good at representing minor changes (ie a few bytes)
to very large files, which "incremental dumps" cannot do, and which
(in my experience) rdiff did not do as well as I would like.  A "test
restore" of each delta is done before deleting the old volume, so you
don't need to trust that the xdelta3 algorithm is correct -- you just
have to trust it that it is deterministic.

Hope you find this useful,

  - a


#!/bin/bash -e

# A script for "full and backward diff" style incremental backups of
# one or more cells' AFS dumpfiles using xdelta3 for diffing.  Note
# that xdelta3 will efficiently represent minor changes to very large
# files, which AFS "incremental dumps" cannot do.

#
# IMPORTANT: you must use xdelta3 version SVN.227 or later -- this
# will become release 3.0u at some point.  There is a precompiled x86
# deb at /afs/megacz.com/debian/xdelta3/xdelta3_svn227.deb
#

# Change these variables to suit your needs.  Note:
# - Backups are kept in $BACKUPDIR/year/month/day/cell/vol.afsdump.
# - At all times the latest backup is kept in "full" form, and all
#   previous backups are kept as "reverse diffs" against the backup
#   from the day AFTER them.  This lets you easily "truncate" the
#   backup history at any time.
# - The symlink $BACKUPDIR/yesterday points to yesterday's backups, if
#   there were any; upon completion of today's backups, yesterday's
#   will be converted into reverse diffs.

BACKUPDIR=/vol/dumps
LOGFILE=/var/log/dump.log

exec 2>&1
exec &> LOGFILE

mkdir -p $BACKUPDIR
cd $BACKUPDIR
DIR=`date +%Y/%m/%d`
mkdir -p $DIR
rm -f today
ln -fs $DIR today

dump() {
  CELL=$1
  VOL=$2

  OLD=yesterday/$CELL/$VOL.afsdump
  NEW=today/$CELL/$VOL.afsdump

  aklog -c $CELL
  mkdir -p today/$CELL

  echo "=============================================================================="
  echo "dumping $VOL in cell $CELL..."
  echo
  tokens
  vos examine $VOL -cell $CELL 
  echo

  if test -e $OLD
  then \
     vos dump $VOL -file $NEW -clone -cell $CELL
     xdelta3 -evfs $NEW $OLD $OLD.rev-xdelta || true
     ((xdelta3 -dcvfs $NEW $OLD.rev-xdelta | cmp - $OLD) && (echo 'removing...'; rm $OLD)) || true
  else \
    vos dump $VOL -file $NEW -clone -cell $CELL
  fi

  echo "  done dumping $VOL."
}

# each command below is in the form "dump <cellname> <volumename>"
dump megacz.com               pub
dump megacz.com               srv
dump megacz.com               web
dump megacz.com               mail
dump megacz.com               work
dump megacz.com               user
dump megacz.com               user.megacz
dump megacz.com               root.cell
dump research.cs.berkeley.edu root.cell

# update the "yesterday" symlink so tomorrow's backups will deltify todays'
rm -f yesterday
mv today yesterday


##############################################################################
# old rdiff-based code; please ignore
##############################################################################
#
# rdiff delta     $OLD.sig $NEW      $OLD.fwd
# rdiff patch     $OLD     $OLD.fwd  $NEW
# rm $OLD.fwd
#
#    test -e $OLD.sig || rdiff signature $OLD     $OLD.sig
#    vos dump $VOL -clone -cell $CELL \
#        | tee $NEW  \
#        | (rdiff -- signature - - \
#           | tee $NEW.sig \
#           | rdiff -- delta - $OLD - \
#           | (tee $OLD.rev \
#              | rdiff -- patch $NEW - - \
#              | (cmp $OLD - && (echo "removing old dump..."; rm -f $OLD))))
#    rm -f $OLD.sig