[OpenAFS] Some backup advice needed.

Steve Simmons scs@umich.edu
Fri, 21 Apr 2006 15:27:28 -0400

> We use the native afs backup. We currently do a full backup of the  
> whole cell weekly followed by daily incrementals until we cycle  
> back to Monday midnight and start the next full.
> As the size of the cell (user vols ) grows our backup window is  
> getting way too big. Often it stretches into late Wed before it  
> completes.
> I am wondering how many are doing weekly fulls? I am considering a  
> monthly full with incrementals the rest of the month but thats  
> kinda scary if maybe your full is not good.

We do every-other-week fulls rotating across 10 very large file  
servers, and
the fulls still take too long. We're in the process of implementing  
dump to disk, and are actually going to a 28-day cycle of a full,  
three level 1 at week intervals, and daily level 2s from the previous  
weekly or full, depending on day of the month:

      M D D D D D D  Cycle 1
      W D D D D D D
      W D D D D D D
      W D D D D D D
      M D D D . . .  Cycle 2

We will be holding on to two cycles.  At the moment we are doing this  
with vos dump, using a heavily modified version of the afsdump script  
that Matt Hoskins so kindly posted a month or two back. Yes, we will  
publish when done
(another week or two), and I want to offer advance apologies to Matt  
for how unrecognizable it's going to be.

Because it uses vos dump rather that backup, we are going to do full  
backups of 1/28th of the volumes every day. This will be a bit of a  
hands-on nightmare to get rolling, but once in place it'll be largely  
self-sustaining (handwave, handwave).

So the short answer is 'don't do all your fulls at once.' It  
complicates your restores and requires you do more tracking or very  
sensible naming, but that's what dbs are for, right?

> Also I often need to move volumes during the backup time and I am  
> unsure how this affects the backups

My seat-of-the-pants experience is that if a volume is scheduled for  
and is moved after the backup starts but before it is actually backed  
up, no backup occurs because there is no .backup volume. Create one  
after the move, and all is fine.

One of the many nice things about Matt's script is that it seems to  
be relatively impervious to move issues. Yes, you can get bitten by  
race conditions, but the worst thing that happens is that you get an  
extra level 0 when a volume is moved. Er, assuming I'm reading his  
code right.