[OpenAFS] Re: DAFS dasalvager: cannot be running from cron

Andrew Deason adeason@sinenomine.net
Thu, 1 Aug 2013 17:30:05 -0500


On Mon, 8 Jul 2013 11:38:41 -0500
Andrew Deason <adeason@sinenomine.net> wrote:

> From a developer perspective:
> 
> This is due to the DAFS_FS / DAFS_UTIL mess; [...] I can think of a
> few possible ways of improving this:
[...]
>  - Just use DAFS_FS everywhere and make all DAFS utilities pthreaded,
>    reducing the number of different codepaths. Is there any reason we
>    were avoiding this? It's not like we need LWP DAFS, and any further
>    granularity of "what type of program is running" should be handled
>    at runtime by programType anyway.

This is gerrit 10123, if anyone wants to follow it. That approach seems
to work fine.

On Wed, 17 Jul 2013 09:31:58 +0000
"Brunckhorst, Ralf" <ralf.brunckhorst@hp.com> wrote:

> We are using the restore incremental function in the first example to
> have only a delta-release when it comes to releasing the TAG-volume.
> If the TAG volume doesn't exist, it will be created and vos restore
> switch automatically to full.

Okay, that makes sense. It's just that the example you gave specified
'time -0', so that isn't an incremental dump :)

> It would be great to hear more about alternatives like you mentioned
> below.

Well, for duplicating an entire cell, there's not much else specific to
say; you just dump/restore as you have been doing, but you just restore
to a different cell.

For a linked cell, though, you can do this only for a few volumes
without needing to duplicate all of the volumes in the cell. The concept
of a linked cell is conceptually not very complex, but can allow you to
do somewhat complex things. Say you link cell a.example.com to cell
b.example.com. You can do this by specifying in CellServDB something
like:

>b.example.com # B cell
192.0.2.2 # foo.b.example.com
>a.example.com b.example # A cell
192.0.2.102 # bar.a.example.com

In such a setup, if a client is looking for a volume in cell
a.example.com and that volume doesn't exist, the client will try to find
that volume again in cell b.example.com.


So, as an example of how this helps, say you have an existing cell
called global.example.com with all of your data in it. And say you have
one volume that you want heavily replicated; call it vol.rep. You can
create a linked cell called, say, uk.example.com, and link it to cell
global.example.com. Then create a new volume also called vol.rep in the
cell uk.example.com, and replicate it to, say, 10 servers in the UK; put
no other volumes in uk.example.com. And you update vol.rep from the main
global.example.com cell using pretty much the same 'vos dump' / 'vos
restore' process you mentioned earlier.

Now set all of your clients in or around the UK to uk.example.com. For
most volumes, any such client will see that the volume doesn't exist in
uk.example.com, and will get the volume from global.example.com. But for
the volume vol.rep, it does exist in uk.example.com, and so the UK
clients will get the volume from the 10 servers in the UK. You can do
this again for other locations, as many times as you like, and each
location will get a different set of 10 servers for vol.rep.

But of course, this doesn't have to be location-based. You could simply
create cells like rep04.example.com, rep05.example.com, etc, and spread
them all out globally, too, and partition your clients between the
rep*.example.com cells. Or you can make a rep.example.com cell which has
different CellServDB entries for different clients (or different
AFSDB/SRV records for different clients), or make rep.example.com be a
cell alias for e.g. rep05.example.com.

It's also possible to do this the other way around; to have the cell
global.example.com linked to rep.example.com, and anything that doesn't
exist in global.example.com will cause the client to look in
rep.example.com, and rep.example.com can point to different servers on
different clients. That will cause fewer VLDB lookups for the possibly
more common case of volumes in global.example.com, but having
rep.example.com point to different servers from different clients can be
confusing. If you wanted to do that, though, you might as well just have
an unrelated rep.example.com cell, and mount e.g. vol.rep by specifying
the rep.example.com cell explicitly.

-- 
Andrew Deason
adeason@sinenomine.net