[OpenAFS] Re: Read-only replication

Simon Wilkinson sxw@inf.ed.ac.uk
Thu, 17 Jun 2010 19:35:13 +0100

On 17 Jun 2010, at 16:29, Andrew Deason wrote:
> If you're only using volumes for home directories, or things like =
> collaboration space, then RO volumes are not very useful to you.

Actually, that's definitely not true in my experience. See below.

> As you mentioned, they can also 'kinda' be used for backup purposes,
[ snip ]
> I definitely wouldn't recommend that for home dirs or anything like =
> though, since from the user's perspective it looks like their data =
> suddenly went back in time by a day. Usually it's not much better than
> just having real backups.

I'd strongly recommend considering them for this use - alongside a real =
backup solution. As a site which lost a third of it's homedirectories in =
a fire, we've got real world experience of how painful the restoration =
process can be. As a result, we make nightly disk->disk backups of all =
of our AFS data using the read-only volume mechanism. Should our primary =
computing site disappear in a ball of flame or (perhaps more likely) a =
RAID controller decided to spew garbage across a disk array, we can =
restore user data within a matter of minutes.

I'm probably preaching to the converted here - but backups are =
completely unimportant. It's restores that are vital. When designing =
your backup solution, you have to consider how you get that data back, =
and what you get that data back onto. If your computing centre is hit in =
a meteor strike - do you have the disk capacity to restore your critical =
data? If you don't have it, how quickly can you obtain it? Do you have =
to wait for an insurance claim to be processed before you can do so? And =
so on ... Once you've got the disk capacity - what's the fastest that =
you can spool data out of your current backup system onto those disks? =
At that rate, how long will it take to do the restore?

We ran those figures a while back, and decided that it made much more =
sense to maintain disk-based backups in parallel with our production AFS =
data. This means that should the proverbial meteor strike, we are just =
one quick operation away from restoring all of our data (albeit, on a =
downgraded service). We also keep tape backups, both for archival =
purposes, and for additional security. But our rapid response plan is to =
promote the read-only.