[OpenAFS] byte-level incremental backups using "full+reverse delta"

Stephen Joyce stephen@physics.unc.edu
Sun, 10 Feb 2008 23:24:54 -0500 (EST)


These contributions are appreciated, especially if someone can use them.

But anytime you send something like this to a list or put it on the web, 
it's a very good idea to state the license terms in the script, or at least 
in the accompanying message. Unless you release your work into the public 
domain or under an open license, some people may hesitate to use it and may 
not know whether or not they can re-distribute your work, and if so under 
what conditions.

I recommend http://www.gnu.org/licenses/licenses.html :-)

Cheers, Stephen
--
Stephen Joyce
Systems Administrator                                            P A N I C
Physics & Astronomy Department                         Physics & Astronomy
University of North Carolina at Chapel Hill         Network Infrastructure
voice: (919) 962-7214                                        and Computing
fax: (919) 962-0480                               http://www.panic.unc.edu

Don't judge a book by its movie.

On Fri, 8 Feb 2008, Marcel Koopmans wrote:

> Well as we are giving away scripting for replication...
> This script works *without* configuration and maybe people can learn from
> it.
> Just run it on volservers and don't forget to get your AFStoken first.
>
> repl_afs.pl
>
> ------
>
> #!/usr/bin/perl -w
>
> #
> # OpenAFS volume replication
> #   by : Marcel D.A. Koopmans of Elysium Open Systems
> #
> #   version : 1.0.5 ( 2007-08-24 20:25 GMT+1 )
>
> use strict;
> use Sys::Syslog;
>
> sub write_log {
>   my ( ${level}, ${message} ) = @_;
>
>   openlog( "afs_repl", "", "DAEMON" );
>   syslog( ${level}, ${message} ."\n" );
>   closelog;
> }
>
> sub main {
>  my ${err}=0;
>  my ( $a, $b );
>  my @temp;
>  my ( @ip_a, %ip_h );
>  my ( %hn_h );
>  my ( $volume, $sites, $host );
>  my ( ${exitcode} );
>
>  # log start
>
>  write_log( "INFO", "OpenAFS replication start" );
>
>  # get local IP addresses
>
>  open ( FILE, "/sbin/ifconfig -a |" );
>  while ( <FILE> ) {
>    chomp($_);
>    if ( $_=~ m/^\s+inet\s+addr:([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\s+.*$/ ) {
>      if ( ! exists ( ${ip_h{$1}} ) ) {
>        ${ip_h{$1}}=1;
>        ${ip_a[$#{ip_a}+1]}=$1;
>      }
>    }
>  }
>  close ( FILE );
>
>  # sort IP addresses;
>
>  @temp=@ip_a;
>  @ip_a=sort(@temp);
>  $#temp=-1;
>
>  # resolve hostnames
>
>  if ( $#{ip_a} > -1 ) {
>    for ( $a=0; $a<=$#{ip_a}; $a++ ) {
>      open ( FILE, "/usr/bin/getent hosts " . ${ip_a[$a]} . " |" );
>      while ( <FILE> ) {
>        chomp($_);
>        @temp=split(/\s+/,$_);
>        for ( $b=0; $b<=$#temp; $b++ ) {
>          if ( ! exists ( ${hn_h{$temp[$b]}} ) ) {
>            ${hn_h{$temp[$b]}}=1;
>          }
>        }
>      }
>      close ( FILE );
>    }
>  }
>
>  # Get openAFS volumes
>
>  $a=-1;
>  open ( FILE, "/usr/bin/vos listvldb 2> /dev/null |" );
>  while ( <FILE> ) {
>    chomp($_);
>    $_=~ s/\s+$//;
>
>    if ( ( $a == 2 ) && ( $_=~
> m/^\s+server\s+(.+)\s+partition\s+\/vicep[a-z]+\s+RW\s+Site$/ ) ) {
>      ${host}=$1;
>      if ( exists ( $hn_h{$host} ) ) {
>        if ( ${sites} > 1 ) {
>          ${exitcode}=system("/usr/bin/vos release " . ${volume} . " >
> /dev/null 2>&1" );
>          ${exitcode}=${exitcode} / 256;
>
>          if ( ${exitcode} == 0 ) {
>            write_log( "INFO", "release of volume " . ${volume} . "
> success" );
>          } else {
>              write_log( "ERR", "release of volume " . ${volume} . " failed,
> returned exit code " . ${exitcode} );
>          }
>        }
>      }
>    }
>
>    if ( ( $a == 1 ) && ( $_=~
> m/^\s+number\s+of\s+sites\s+\-\>\s+([0-9]+)$/ ) ) {
>      $a=2;    # sites
>      ${sites}=$1;
>    }
>
>    if ( ( $a == 0 ) && ( $_=~ m/^(.+)$/ ) ) {
>      $a=1;    # volume name
>      ${volume}=$1;
>    }
>
>    if ( $_=~ m/^$/ ) {
>      $a=0;    # new block
>    }
>
>  }
>  close ( FILE );
>
>  # log end
>
>  write_log( "INFO", "OpenAFS replication end" );
>
> }
>
> exit ( main );
>
> ------
>
> With kind regards,
>  Marcel
>
>
>
>
>
> -----Original Message-----
> From: openafs-info-admin@openafs.org
> [mailto:openafs-info-admin@openafs.org]On Behalf Of Adam Megacz
> Sent: 07 February 2008 19:22
> To: openafs-info@openafs.org
> Cc: hcoop-sysadmin@hcoop.net
> Subject: [OpenAFS] byte-level incremental backups using "full+reverse
> delta"
>
>
>
> In case anybody else finds this useful, I've worked out a system for
> doing backups of an AFS volume-set with the "full and reverse delta"
> style of incremental backups.
>
>  /afs/megacz.com/srv/bin/dump.sh
>
> Much like rdiff-backup, this keeps a complete copy of the most recent
> backup, and stores all previous backups as deltas against the
> *following* backup (ie in the "reverse" direction).  This means you
> can "truncate" the backup history whenever you like, rather than only
> at full-backup intervals, and there's never any reason to keep more
> than a single "full" backup around.
>
> Deltas are done with xdelta3.  Although it is very CPU-intensive, it
> seems to be very good at representing minor changes (ie a few bytes)
> to very large files, which "incremental dumps" cannot do, and which
> (in my experience) rdiff did not do as well as I would like.  A "test
> restore" of each delta is done before deleting the old volume, so you
> don't need to trust that the xdelta3 algorithm is correct -- you just
> have to trust it that it is deterministic.
>
> Hope you find this useful,
>
>  - a
>
>
> #!/bin/bash -e
>
> # A script for "full and backward diff" style incremental backups of
> # one or more cells' AFS dumpfiles using xdelta3 for diffing.  Note
> # that xdelta3 will efficiently represent minor changes to very large
> # files, which AFS "incremental dumps" cannot do.
>
> #
> # IMPORTANT: you must use xdelta3 version SVN.227 or later -- this
> # will become release 3.0u at some point.  There is a precompiled x86
> # deb at /afs/megacz.com/debian/xdelta3/xdelta3_svn227.deb
> #
>
> # Change these variables to suit your needs.  Note:
> # - Backups are kept in $BACKUPDIR/year/month/day/cell/vol.afsdump.
> # - At all times the latest backup is kept in "full" form, and all
> #   previous backups are kept as "reverse diffs" against the backup
> #   from the day AFTER them.  This lets you easily "truncate" the
> #   backup history at any time.
> # - The symlink $BACKUPDIR/yesterday points to yesterday's backups, if
> #   there were any; upon completion of today's backups, yesterday's
> #   will be converted into reverse diffs.
>
> BACKUPDIR=/vol/dumps
> LOGFILE=/var/log/dump.log
>
> exec 2>&1
> exec &> LOGFILE
>
> mkdir -p $BACKUPDIR
> cd $BACKUPDIR
> DIR=`date +%Y/%m/%d`
> mkdir -p $DIR
> rm -f today
> ln -fs $DIR today
>
> dump() {
>  CELL=$1
>  VOL=$2
>
>  OLD=yesterday/$CELL/$VOL.afsdump
>  NEW=today/$CELL/$VOL.afsdump
>
>  aklog -c $CELL
>  mkdir -p today/$CELL
>
>  echo
> "===========================================================================
> ==="
>  echo "dumping $VOL in cell $CELL..."
>  echo
>  tokens
>  vos examine $VOL -cell $CELL
>  echo
>
>  if test -e $OLD
>  then \
>     vos dump $VOL -file $NEW -clone -cell $CELL
>     xdelta3 -evfs $NEW $OLD $OLD.rev-xdelta || true
>     ((xdelta3 -dcvfs $NEW $OLD.rev-xdelta | cmp - $OLD) && (echo
> 'removing...'; rm $OLD)) || true
>  else \
>    vos dump $VOL -file $NEW -clone -cell $CELL
>  fi
>
>  echo "  done dumping $VOL."
> }
>
> # each command below is in the form "dump <cellname> <volumename>"
> dump megacz.com               pub
> dump megacz.com               srv
> dump megacz.com               web
> dump megacz.com               mail
> dump megacz.com               work
> dump megacz.com               user
> dump megacz.com               user.megacz
> dump megacz.com               root.cell
> dump research.cs.berkeley.edu root.cell
>
> # update the "yesterday" symlink so tomorrow's backups will deltify todays'
> rm -f yesterday
> mv today yesterday
>
>
> ############################################################################
> ##
> # old rdiff-based code; please ignore
> ############################################################################
> ##
> #
> # rdiff delta     $OLD.sig $NEW      $OLD.fwd
> # rdiff patch     $OLD     $OLD.fwd  $NEW
> # rm $OLD.fwd
> #
> #    test -e $OLD.sig || rdiff signature $OLD     $OLD.sig
> #    vos dump $VOL -clone -cell $CELL \
> #        | tee $NEW  \
> #        | (rdiff -- signature - - \
> #           | tee $NEW.sig \
> #           | rdiff -- delta - $OLD - \
> #           | (tee $OLD.rev \
> #              | rdiff -- patch $NEW - - \
> #              | (cmp $OLD - && (echo "removing old dump..."; rm -f $OLD))))
> #    rm -f $OLD.sig
>
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>
>
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>