[OpenAFS-devel] Re: 1.6 and post-1.6 OpenAFS branch management and schedule

Andrew Deason adeason@sinenomine.net
Wed, 16 Jun 2010 10:38:37 -0500


On Wed, 16 Jun 2010 08:43:39 -0400
Steven Jenkins <steven.jenkins@gmail.com> wrote:

> > With the Demand Attach Fileserver (DAFS) this initial salvage is not
> > necessary any more, however, each volume which was not cleanly
> > detached before gets salvaged in the background. This is a nice
> > feature which allows the most demanded volumes to come up soonly, I
> > hope, but still salvaging will take hours because it's the same
> > amount of work that has to be done.
> 
> Keep in mind that DAFS never brings volumes online unless requested by
> a client, so some volumes may never be attached; it also takes volumes
> offline after a period of unuse, so a DAFS server will only need to
> salvage the 'active' volumes after a crash.  The combination of those
> two features greatly reduces the number of volumes to salvage, so it's
> not actually doing the same amount of salvaging (in the general case)
> as a traditional fileserver.

Yes, but I think the initial hit of the VGC scan[0] currently makes
*any* salvage immediately after a crash potentially a large problem,
depending on the site. Even if salvaging your very important volume
takes seconds after a crash, you will still have to wait possibly
minutes for the VGC, if you have a lot of volumes, before you can
salvage anything.

Also, keep in mind that with DAFS, checking the inUse volume header is
no longer the only thing that can trigger a salvage. Previously with
fast-restart, if a volume had some recognizable internal problem, we
just took it offline and you had to manually notice and salvage it;
that's (one reason) why I hear so much complaining about it. With DAFS
we can still automatically salvage in those cases, so in my opinion one
of the big objections to fast-restart goes away.

However, I really really wish that if we have something like
fast-restart with DAFS, it wouldn't just be enabling FAST_RESTART and
AFS_DEMAND_ATTACH_FS at the same time (some code assumes they cannot be
enabled at the same time, but at least for Hartmut's case it probably
doesn't matter). This can very easily just be a fileserver (and probably
volserver) runtime option that just makes VShouldCheckInUse() return 0
in some cases. I don't see a reason for preventing administrators from
choosing this route, as long as no packages nor ourselves turn it on
without forcing the administrator to make a conscious decision to do so.
We can always give the option a scarier name if that helps...


[0] For those not familiar with it, the VGC is the "volume group cache".
The first time we salvage something on a partition, we must look at the
header of every volume on the partition to determine the volume group
membership of all volumes, due to the on-disk structure of volumes. This
scan can take a little time for a large number of volumes.

-- 
Andrew Deason
adeason@sinenomine.net