[OpenAFS] Re: [OpenAFS-devel] 1.6 and post-1.6 OpenAFS branch management and schedule

Fri, 18 Jun 2010 04:14:32 -0400

--On Thursday, June 17, 2010 01:45:14 PM -0500 "Christopher D. Clausen" 
<cclausen@acm.org> wrote:

> I have heard that, but I have never experienced any problems myself in
> many years of running that way.  In general the way I see it is that if
> the power goes out, my server stays up for a little longer due to its UPS
> but the network dies immediately so the AFS processes are not doing
> anything when the power finally dies and the server goes down a few
> minutes later.  (This is of course assuming no actual server crashes and
> luckily I haven't had any of those.)

We're a bit more agressive over here.  If the power goes out, my servers 
stay up for a little longer due to the UPS.  So does the machine room 
network.  And the rest of the machine room.  And the clients.  And _their_ 
network.  See, a few years ago a dean decided that it was unacceptable that 
a power outage had killed one of his desktop machines (the hardware, that 
is).  So, we raised the rates a bit, bought UPS's for every machine in 
every office, and after the first couple of years started a rotating 
replacement schedule.  It's really _very_ nice, but it does mean you can't 
count on the clients to die before the servers.

Really, I consider enable-fast-restart to be extremely dangerous.
It should have gone away long ago.

I realize some people believe that speed is more important than not losing 
data, but I don't agree, and I don't think it's an appropriate position for 
a filesystem to take.  Not losing your data is pretty much the defining 
difference between filesystems you can lose and filesystems from which you 
should run away screaming as fast as you can.  I do not want people to run 
away screaming from OpenAFS, at any speed.

Bear in mind that enable-fast-restart doesn't mean "start the fileserver 
now and worry about checking the damaged volumes later".  It means "start 
the fileserver now and ignore the damaged volumes until someone complains, 
by which time it may be months later and too late to recover the lost data 
from backups".  It may also mean worse.

Also bear in mind that we're talking about a change after DAFS is good 
enough to be on by default, at which point restarts will _already_ be fast, 
even if you salvage everything that needs it up front, because not every 
volume will have been online at the time of the crash.

> I guess I don't understand the particulars of what could happen, but if
> one is really worried about sending corrupt data, wouldn't the best thing
> to do be check the data as it is being sent and return errors then and
> log that something is wrong, not require an ENTIRE VOLUME to be salvaged,
> leaving all of the files inaccessible for a potentially long period of
> time?  I assume that such a thing is not possible to do?

That's right; it's not possible to do.  We're not talking about verifying 
the (nonexistent) checksums we (don't) keep on data.  We're talking about 
verifying that the filesystem structure is self-consistent, so we don't 
have things like two unrelated directory entries pointing at the same 
vnode, or two vnodes pointing at the same underlying file, or whole volumes 
whose contents are unreachable because some directory entry is missing. 
And, we're talking about discovering cases where data has already been lost 
or destroyed, in time to maybe do something about it.

People often complain that the salvager destroys their data, or that fsck 
destroys there data.  This is almost never true.  What these programs do is 
discover that your data has already been destroyed, and repair the tear in 
the space-time continuum so that it is safe to keep using and changing 
what's left.