[OpenAFS] Re: [OpenAFS-devel] 1.6 and post-1.6 OpenAFS branch management and schedule
Christopher D. Clausen
cclausen@acm.org
Thu, 17 Jun 2010 13:45:14 -0500
Russ Allbery <rra@stanford.edu> wrote:
> Chris, to check, are you currently using --enable-fast-restart or
> --enable-bitmap-later?
Yes, both of them.
> Please understand that neither of those options are recommended now,
> whether you have DAFS enabled or not. I consider --enable-fast-restart in
> particular to be dangerous and likely to cause or propagate file
> corruption and would not feel comfortable ever running it in production.
> I know that some people are using the existing implementation and taking
> their chances, and if they're expert AFS administrators and know what
> they're risking, that's fine, but, as I understand it, it's pretty much
> equivalent to disabling fsck and journaling on your file systems after
> crashes and just trusting that there won't be any damage or that, if there
> is, you'll fsck when you notice it.
I have heard that, but I have never experienced any problems myself in many
years of running that way. In general the way I see it is that if the power
goes out, my server stays up for a little longer due to its UPS but the
network dies immediately so the AFS processes are not doing anything when
the power finally dies and the server goes down a few minutes later. (This
is of course assuming no actual server crashes and luckily I haven't had any
of those.)
Its fine to not have it enabled by default, but I can't see why one would
remove the functionality from the source tree.
If you want to require a --yes-i-know-i-can-corrupt-data configure option,
that is also fine, but requiring source code patches sounds like an major
annoyance.
-----
I guess I don't understand the particulars of what could happen, but if one
is really worried about sending corrupt data, wouldn't the best thing to do
be check the data as it is being sent and return errors then and log that
something is wrong, not require an ENTIRE VOLUME to be salvaged, leaving all
of the files inaccessible for a potentially long period of time? I assume
that such a thing is not possible to do?
I mean I occationally see NTFS errors in the event log on Windows servers.
Windows doesn't take the disk offline and run a chkdsk for me to prevent
potential errors, it allows me to try and access other data and if it works
there are no problems and denies access to specific files or directories if
there is corruption.
>> At the same time, I'd be happy to start doing more testing of the
>> various DAFS features, although I'm not quite sure what version I should
>> be using for testing,
>
> If you want to test DAFS, you need to use a 1.5 series server or (coming
> soon) a 1.6 release candidate.
Ah, excellent. I will wait for a 1.6 release candidate.
Will DAFS be enabled by default in 1.6? Or is that still being determined?
>> nor am I completely sure how to actually migrate an existing file server
>> to use DAFS or if there is a reverse path to downgrade if I encounter
>> problems.
>
> Migration is documented in the bos_create(8) man page as one of the
> examples. You can do the inverse procedure to downgrade, although of
> course you'll also need to replace the server binaries with a version
> compiled without demand-attach.
Ok, so http://docs.openafs.org/Reference/8/bos_create.html is the only
documentation on openafs.org on demand attach?
Ah, I see a http://docs.openafs.org/Reference/8/salvageserver.html as well.
Perhaps a generic dafs man page is in order for us non-developer types to be
up to speed on what DAFS is, what the benefits are, and how to use it
correctly?
<<CDC