[OpenAFS] Re: Status of "vos shadow"

Adam Megacz megacz@cs.berkeley.edu
Sat, 23 Jun 2007 11:08:18 -0700

I've added four new questions to the FAQ to summarize this
information.  Thanks again!


I'd appreciate it if people could take a look at question 3.48 and the
last paragraph of 3.50 to make sure I got these parts right.

  - a

Steve Simmons <scs@umich.edu> writes:
> This is a quick note to discuss our experiences with shadows thus
> far. We'd hoped to be done long before now, but other work keeps
> getting in the way of pushing this forward. We are now in early
> pilot, and hope to have an initial set in production by end of summer.
> We (well, Dan Hyde) found that the shadow code was largely
> complete. We did find one serious bug that could cause lossage of the
> original  volume; I believe Dan has forwarded that fix to the group.
> One of the biggest problems we bumped into was only semi-technical. It
> was the lack of definition of what a shadow *should* be as opposed  to
> what a shadow is. We made decisions that suit us, but they
> necessarily reflect our intended use for shadows. Your mileage may
> vary, and we're certainly interested in and amenable to changes if
> the community comes to a decision on them.
> Our purpose: disaster recovery by means of invisible replicated
> volumes. We envision a set of DR hosts with a shadow volume that
> replicates a production volume. If a host hard-fails and isn't likely
> to come back in a reasonable amount of time, we will go to the shadow
> server and promote the relevant volumes from shadow to production. At
> that time the vldb is modified to show the shadow host as the real
> host, and the on-server copy of the volume is changed from type
> 'shadow' to type 'production' (handwave, handwave). "A reasonable
> amount of time" is site-dependent, of course.
> Shadows do not appear in the vldb. Their existence is known only to
> the host which contains a particular shadow. Thus one might have many
> shadows, up to and including one on each vice partition in a cell.
> There is no required relationship of name, parenthood, etc, between a
> shadow and the volume from which it was created. (For the rest of
> this note, we'll refer to the original volume as the parent, and a
> shadow of a parent as a child.)
> Simple shadowing of a parent onto a non-existent child creates a new
> volume identical to the parent in all but name and visibility.
> Incrementally shadowing a parent onto a child brings the child up-to-
> date with the parent, and is a proportionately faster operation.
> Bad things you can do:
> Shadowing a volume on to another volumes child results in a jumbled
> and probably useless volume. We don't think it should be permitted,
> but lacking a more extensive and better-defined child/parent
> relationship we don't see a way to prevent it. Properly that
> relationship should be in the vldb, but that requires much more
> extensive changes than (a) we were willing to make and (b) we thought
> the community would accept without pre-agreement as to what that
> relationship would be.
> Shadowing a shadow onto itself results in disaster. We have now
> forbidden that in the code.
> Shadowing onto a production volume should and does fail. I don't
> recall if we had to modify the code for that, but if so, that'll be
> part of the patch when we release.
> There is now a vos command which promotes a shadow to production. It
> does nothing to the parent, which will continue to exist on the
> original server/vice partition and could be re-promoted with the
> appropriate vos sync command.
> When a shadow is created, there is a mark in its volume header which
> indicates it is a clone. During the salvage process shadows are
> handled properly. If I recall correctly, we had to make no changes to
> the salvager for this, but if shadows were to appear in the vldb that
> might be a different story.
> I don't recall if you can have a shadow named after its parent on the
> same server and vice partition as the parent.
> We found a great deal of code that implies a long-term relationship
> between parent and child was intended, but that code is clearly
> incomplete. Unfortunately it's incomplete to such a degree that it's
> not possible to tell what the author(s) intended that relationship to
> be.
> More detail on our intended usage:
> For every AFS server we have, we will have a shadow server. When a
> volume is created on a server, a shadow is quickly created (semi-
> automated process) on the designated shadow server. When a volume is
> moved from one server to another, the shadow is removed from the old
> shadow host and created on the new host. As often as we can manage
> without affecting server performance (ie, TBD), we will incrementally
> refresh parents to children.
> When a disaster occurs (an entire server is lost and not recoverable
> in a reasonable amount of time), the shadow server is brought on
> line. Assuming we've done our job correctly, user volumes simply
> reappear with a new location. The content of those volumes is as up-
> to-date as the most recent refresh of the shadow. Our seat-of-the-
> pants guess is that we can refresh each shadow about 4 times a day
> without affecting overall performance.
> "A semi-automated process:" it happens out of cron. A shadow server
> gets the volumes list for the host it's shadowing, and does the
> creation/updating as needed. Since a shadow server knows what shadows
> it's got (think 'vos listvol'), it also can duplicate shadows it
> doesn't need any more. Note this means when a volume is moved, some
> interesting race conditions may ensue. The easiest way to fix those
> race conditions is by putting the shadows into the vldb, but again,
> that is a bigger change than we wanted to put in without a broad
> agreement from the community.
> Some fallout/things discovered while testing the above - there's no
> real need to create a shadow at volume creation time; doing an
> incremental onto a non-existent shadow creates the shadow in exactly
> the same manner as doing a full shadow. Some might regard this as a
> bug; for the moment we're taking advantage of it.
> Our new, second data center just went on line this week. With that in
> place, we can start the initial pilot work on shadows as disaster
> recovery.

PGP/GPG: 5C9F F366 C9CF 2145 E770  B1B8 EFB1 462D A146 C380