[OpenAFS] Re: [OpenAFS-devel] Re: 1.6 and post-1.6 OpenAFS branch management and schedule

Wed, 16 Jun 2010 18:38:37 -0400

On Wed, Jun 16, 2010 at 11:38 AM, Andrew Deason <adeason@sinenomine.net> wr=
ote:
> On Wed, 16 Jun 2010 08:43:39 -0400
> Steven Jenkins <steven.jenkins@gmail.com> wrote:
>
>> > With the Demand Attach Fileserver (DAFS) this initial salvage is not
>> > necessary any more, however, each volume which was not cleanly
>> > detached before gets salvaged in the background. This is a nice
>> > feature which allows the most demanded volumes to come up soonly, I
>> > hope, but still salvaging will take hours because it's the same
>> > amount of work that has to be done.
>>
>> Keep in mind that DAFS never brings volumes online unless requested by
>> a client, so some volumes may never be attached; it also takes volumes
>> offline after a period of unuse, so a DAFS server will only need to
>> salvage the 'active' volumes after a crash. =A0The combination of those
>> two features greatly reduces the number of volumes to salvage, so it's
>> not actually doing the same amount of salvaging (in the general case)
>> as a traditional fileserver.
>
> Yes, but I think the initial hit of the VGC scan[0] currently makes
> *any* salvage immediately after a crash potentially a large problem,

No.  If you have a damaged volume in *any* environment, be that 1.4,
1.4+fast-restart, or DAFS, you will need to perform a VGC scan
[whether it is via _VVGC_scan_partition() or GetVolumeSummary() is
immaterial].  The key difference is whether or not this cost is
amortized up-front, or expended on every salvage.

Classic 1.4: you would perform the VGC scan exactly once per partition
during invocation of the salvager bnode.  During this VGC scan,
nothing could be served as the fileserver process isn't even running.

1.4+fast-restart: your damaged volumes attach, and either fail when
errors are encountered, or silently serve up garbage[1].  When you go
to salvage them with bos salvage, you get to incur that VGC scan cost
for each and every damaged volume.

DAFS: we incur the scan cost exactly once while building up the cache,
but we do it in the background while actively serving (un-damaged)
volumes.  Thus, the only penalty incurred is background I/O.  Yes, no
demand salvages can be run until the vice partition's VVGC is built,
but that was always true [by virtue of doing the scan in
GetVolumeSummary()].

No matter what scenario you look at, the VGC scan costs are there; the
differentiators are whether scan costs are constant or linear with
respect to the number of salvages, and whether anything can be served
while the salvages/VGC scans are occuring.  All of the wins are on the
side of DAFS.

> depending on the site. Even if salvaging your very important volume
> takes seconds after a crash, you will still have to wait possibly
> minutes for the VGC, if you have a lot of volumes, before you can
> salvage anything.

This was always, true thanks to GetVolumeSummary().

>
> Also, keep in mind that with DAFS, checking the inUse volume header is
> no longer the only thing that can trigger a salvage. Previously with
> fast-restart, if a volume had some recognizable internal problem, we
> just took it offline and you had to manually notice and salvage it;
> that's (one reason) why I hear so much complaining about it. With DAFS
> we can still automatically salvage in those cases, so in my opinion one
> of the big objections to fast-restart goes away.
>
> However, I really really wish that if we have something like
> fast-restart with DAFS, it wouldn't just be enabling FAST_RESTART and
> AFS_DEMAND_ATTACH_FS at the same time (some code assumes they cannot be

hear hear!

> enabled at the same time, but at least for Hartmut's case it probably
> doesn't matter). This can very easily just be a fileserver (and probably
> volserver) runtime option that just makes VShouldCheckInUse() return 0
> in some cases. I don't see a reason for preventing administrators from

That's fundamentally unsafe--you could end up serving garbage, which
is why I have been, and remain, adamantly opposed to FAST_RESTART.
VCheckInUse should *always* return 1 for programType=3D=3DfileServer.
OTOH, I'd be fine with a fileserver command line switch that makes
VCanScheduleSalvage() always return 0 and/or VRequestSalvage_r()
become a no-op...

\begin{soapbox}
If you want your fileserver to serve up volumes that it actively knows
were attached, and quite possibly in an inconsistent state at the time
of the crash, well, then you should get to apply an out-of-tree patch.
 I don't see any good reason why OpenAFS should be in the business of
supplying code that does something that unsafe.  Of course, I am not a
gatekeeper, but that is my $0.02.
\end{soapbox}

-Tom

> choosing this route, as long as no packages nor ourselves turn it on
> without forcing the administrator to make a conscious decision to do so.
> We can always give the option a scarier name if that helps...
>
>
> [0] For those not familiar with it, the VGC is the "volume group cache".
> The first time we salvage something on a partition, we must look at the
> header of every volume on the partition to determine the volume group
> membership of all volumes, due to the on-disk structure of volumes. This
> scan can take a little time for a large number of volumes.

[1] As Andrew has hinted, inUse checking is necessary, but not
sufficient, to determine when a volume needs to be salvaged.  One of
the major changes we made was to redefine inUse so that it flags more
than just ownership by the fileserver process.  The fact that a volume
looked quiescent when owned by something other than the file server
(e.g. volserver, salvager) was a major problem in 1.4...