[OpenAFS] Re: [OpenAFS-devel] 1.6 and post-1.6 OpenAFS branch management and schedule

Tom Keiser tkeiser@sinenomine.net
Fri, 18 Jun 2010 17:28:29 -0400


On Thu, Jun 17, 2010 at 1:43 PM, Russ Allbery <rra@stanford.edu> wrote:
> "Christopher D. Clausen" <cclausen@acm.org> writes:
>> Rainer Toebbicke <rtb@pclella.cern.ch> wrote:
>
>>> No, of course not.
>
>>> It would be painful to have to put back the '--enable-fast-restart and
>>> --enable-bitmap-later' code if you removed them, probably dangerous. My
>>> plea is to keep them in as an alternative to the demand-attach
>>> file-server: with mandatory salvaging the non-demand-attach case is
>>> seriously impaired, hence disabling it is no real alternative.
>
>>> With the ambitious schedule for new releases I see this happening very
>>> quickly. I'd like to avoid having to stop at a particular release next
>>> year because of a functionality that we manage to live without, and
>>> miss others that we're interested in.
>
>> I agree with Rainer on this.
>
> Chris, to check, are you currently using --enable-fast-restart or
> --enable-bitmap-later?
>
> Please understand that neither of those options are recommended now,
> whether you have DAFS enabled or not. =A0I consider --enable-fast-restart=
 in
> particular to be dangerous and likely to cause or propagate file
> corruption and would not feel comfortable ever running it in production.
> I know that some people are using the existing implementation and taking
> their chances, and if they're expert AFS administrators and know what
> they're risking, that's fine, but, as I understand it, it's pretty much
> equivalent to disabling fsck and journaling on your file systems after
> crashes and just trusting that there won't be any damage or that, if ther=
e
> is, you'll fsck when you notice it.
>

I'll note that bitmap-later is also dangerous--it has several known
race conditions (e.g. VFreeBitmapEntry_r is just plain wrong;
GetBitmap() relies upon microarchitectural store ordering rules that
no modern processor guarantees, ...).  These can result in various
classes of corruption from vnodes that fail to be freed until salvage,
to multiple allocations of the same vnode.

-Tom