[OpenAFS] Re: Practicality of near-continuous replication on busy volumes

Andrew Deason adeason@sinenomine.net
Thu, 3 Jul 2014 17:16:10 -0500


On Thu, 3 Jul 2014 17:49:05 +0100
Dominic Hargreaves <dominic.hargreaves@it.ox.ac.uk> wrote:

> Has anyone got any experience of doing this kind of thing with a busy
> and large-ish (100s of GB) volume? Is it workable in principle or have
> people found that there is a practical limit to how quickly/often vos
> releases can be done - or any other flaws in the scheme?

As mentioned, some sites do have automated volume releases at intervals
of around 15 minutes or a bit less. I don't think I've seen anyone go
down to once a minute; I don't even really like it when someone goes
down to 15 minutes, but it can work.

What can make this impractical is how much data is changing in files and
how many clients you have accessing the RO data. The size of the entire
volume doesn't really matter much, but the number of files and the sizes
of individual files can. The differential data sent for a volume release
is currently sent at the per-file level, so if you're appending to a
file (or otherwise changing large files) you have to send the entire
file every time. That alone can mean a release takes a large portion of
a minute, depending on the file size and network link.

Releasing frequently also effectively destroys AFS' client-side metadata
caching for the ROs; currently every release you do means that a client
reading from the RO must contact the fileserver for every single file it
accesses again, even if the file hasn't changed. You can also get
clients that can delay the release because they are holding open
references to the relevant volume, which the (recent) fileserver option
-offline-timeout can alleviate.

If nothing accesses the RO normally, then that doesn't really matter.
But if nothing is accessing the remote copy and you just want a hot
spare, you could maybe more easily handle this at the level of the
/vicepX filesystem. Either use a clustering fs for that, or some kind of
shared block storage or DRBD or something similar.

-- 
Andrew Deason
adeason@sinenomine.net