[OpenAFS] Re: [OpenAFS-devel] 1.6 and post-1.6 OpenAFS branch management and schedule

Hartmut Reuter reuter@rzg.mpg.de
Wed, 16 Jun 2010 12:07:44 +0200


Russ Allbery schrieb:
> I'm aware of the following (largish) things that we want to deprecate or
> remove:
> 
> * --enable-fast-restart and --enable-bitmap-later are earlier attempts to
>   solve the problem that is solved in a more complete way by demand
>   attach.  Demand attach will be available in 1.6 but not enabled by
>   default.  These two options will conflict with demand-attach; in other
>   words, you won't be able to enable either of them and demand attach at
>   the same time.
> 
>   At the point at which we make demand attach the default, rather than
>   optional behavior, I believe we should remove the code for these two
>   flags.  I think that should be for either 1.10 or 2.0 based on
>   experience with running 1.6 in production.  In the meantime, please be
>   aware that most of the developers don't build with those flags by
>   default and the code is not heavily tested.
> 
>   This code is not enabled by default, so if you're not compiling yourself
>   and passing those flags to configure, you're not using this and don't
>   need to worry about it.
> 

Without --enable-fast-restart after a fileserver crash the salvager used to
salvage all volumes in all partitions before the start of the fileserver.
On large fileservers this could take hours and sometimes the salvager went out
of memory and crashed himself leaving still volumes not attachable.

With the Demand Attach Fileserver (DAFS) this initial salvage is not necessary
any more, however, each volume which was not cleanly detached before gets
salvaged in the background. This is a nice feature which allows the most
demanded volumes to come up soonly, I hope, but still salvaging will take hours
because it's the same amount of work that has to be done.

When I looked into the SalvageLog after a fileserver or machine crash I found
out that except the increment of the next uniquifier nearly never anything
important happened. Therefore I wrote many years ago the code to skip the
automatic salvage and sent it to openafs in 2001.

Right now I am working on the integration of rxosd into 1.5.74 (which represents
more or less the actual git master). I enabled for us again to have both options
(fast restart and demand attach) in parallel because my feeling is that a crash
of a  heavily used large fileserver with only demand attach still will be a pain
for a rather long time.

Hartmut

-----------------------------------------------------------------
Hartmut Reuter                  e-mail 		reuter@rzg.mpg.de
			   	phone 		 +49-89-3299-1328
			   	fax   		 +49-89-3299-1301
RZG (Rechenzentrum Garching)   	web    http://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------