[OpenAFS] Re: vos shadow to backup user homes

Andrew Deason adeason@sinenomine.net
Mon, 26 Aug 2013 10:28:51 -0500


On Sun, 25 Aug 2013 21:05:41 +0530 (IST)
Shouri Chatterjee <shouri@ee.iitd.ac.in> wrote:

> I wanted to ask about "vos shadow" and whether it is being used as a
> solution on production systems to back-up user home directories.

I believe it is, but I'll let others speak if they are doing so.

> The most significant information I can find is a thread from this
> email archive last year.
> http://lists.openafs.org/pipermail/openafs-info/2012-December/039077.html

Yes, one of the downsides of shadow volumes is that using them is not
documented as well as other features, and they aren't tested as much. 

Shadow volumes are also maybe a bit more 'unmanaged' than other parts of
AFS, since a shadow volume doesn't exist in the vldb, so effectively AFS
doesn't know about it. This can cause some confusion and problems if an
administrator isn't otherwise aware that the shadow volumes are there.

Also note that you can get a sort of similar setup as shadow volumes by
using regular RO volumes. If you replicate an RO to another server, and
the original server goes down, you can convert the ROs to RWs using 'vos
convertROtoRW'. Some sites use that approach instead.

> I can dedicate a server to host only shadow volumes. If an active
> server fails and dies, the shadow copy can be brought online. Is this
> a better solution than, say:
> (1) keeping a periodically rsync'd copy of the /vicepx partitions on a 
> shadow server

This is pretty much the same thing as using shadow volumes, except it
doesn't go through various openafs systems and server processes, etc, so
it's not as safe. I would say shadow volumes are better than this, yes.

> (2) afs over drbd

If you do this, you should sync the fsstate.dat file as well, or maybe
just the whole /usr/afs/local directory. On the 'shadow machine', you
should not start the openafs server processes until you are performing a
recovery.

This is arguably better than shadow volumes in some ways, and worse in
others. I think I would prefer this over shadow volumes, but it can
depend on what you want. This allows a much smaller gap of lost data,
since depending on your drbd settings, the data for the block device
must be on its way to (or received at) the remote end, before a write
will complete. A shadow volume will lose up to e.g. a day of data, if
you sync them once a day.

However, since this effectively gives you a raw copy of the /vicep*
partition at the time of the original server crash, you may need to
salvage some volumes before they are usable. This might take longer than
the 'vos syncvldb' step when using shadow volumes, or it might not. Or
you might not care :)

Using drbd may also incur a performance hit for everyday usage, since
you have to transmit all of the data to the remote site. Using shadow
volumes doesn't have this problem, unless perhaps you are using the
volumes when you update the shadow volumes. You also need to use drbd
for an entire partition, obviously, whereas shadow volumes can be
specified per-volume.

In general, using drbd is more similar to the 'AFS on SAN' approach,
which is an approach that larger sites definitely do use. I'm not sure
how many sites use drbd in this way today, but they have certainly
existed in the past.

-- 
Andrew Deason
adeason@sinenomine.net