[OpenAFS] why afs backup is so poorly supported (Was: Backup AFS with BackupPC?)

Mon, 09 Oct 2006 03:37:00 -0400

Derrick J Brashear <shadow@dementia.org> had responded to Adam Megacz:
> From: Derrick J Brashear <shadow@dementia.org>
> To: openafs-info@openafs.org
> In-Reply-To: <x34pue5u4w.fsf_-_@nowhere.com>
> Message-ID: <Pine.GSO.4.61-042.0610082232460.10869@johnstown.andrew.cmu.edu>
> Subject: Re: [OpenAFS] why afs backup is so poorly supported (Was: Backup
>  AFS with BackupPC?)
> 
> On Sun, 8 Oct 2006, Adam Megacz wrote:
> 
> > When AFS decided to go with a proprietary on-disk volume format, it
> > isolated its backup requirements from those of all other network
> > filesystems.  Because of this isolation, there is insufficient
> 
> Sure. Beforemost of them existed, AFS failed to anticipate their existance 
> and guess how they'd work.
> 
> > Same argument applies to journalling technologies.  AFS essentially
> > creates a filesystem-within-a-filesystem which *still* doesn't have
> > even the most basic journalling capabilities -- five years after they
> > became a standard feature on server OSes.  Ditto for redundant
> 
> Yup, didn't write that one years in the future either.
> 
> No one is bored enough to write it in their spare time, and no one has 
> cared enough to throw money at it. I'll not labor the point.
> 
> Derrick

So, log based filesystems are supposed to improve filesystem
reliability.  RAID is another modern technology, to improve hardware
mass storage reliability or performance.

Vendors like to say they have the best-est systems.  Fastest.  Largest.
Cheapest.  Smallest.  Most reliable.  And they like to use small
words.  RAID.  Log based.  So, they've used those words.  Basically,
that's modern vendor-nese for fast/big/cheap/small/reliable.

Sadly, it ain't so.  Hardware and software vendors have been promising
all those things since the 1940's (different buzzwords back then of
course).  And they've made tremondous strides.  Computers have scaled
amazingly, in terms of speed, size, reliability, capacity...

So the place I work has mass storage needs.  We use raid.  Somebody's
hardware raid.  The first generation of this, well, it had problems.
If a drive blew up, it usually blew a write on the disk as well.
People got real tired of running fsck on a 2 Tb raid.  The firmware
would commit suicide randomly every N power cycles.  Fortunately, our
UPS usually works, our electricians are mostly done rearranging
circuits, we've replaced the power strip with the flakey power switch,
and now we assume every machine move of that raid hardware may require
a firmware reload.  On our next generation raid, we switched to a log
based filesystem.  It's got some problems.  We learned that the raid
doesn't actually guarantee write order, especially on a power cycle.
The best log based filesystem in the world can't recover from dropping
writes before the last write before a crash.  It dawned on us that the
raid lacked a battery.  This was fixable, but we're still puzzled as to
why it's not only not standard, it wasn't an option either.  We still
have a random lockup problem on very busy servers.  After a certain
amount of time calculation on the first log based filesystem disaster,
we no longer bother trying to do a filesystem check - it can't scale to
that size filesystem and complete in a timely fashion.  Instead, we
just copy all the data we can read to another raid, restore the rest
from backups, & newfs.  Don't get me started on "host raid'.  We have
that too.  Thankfully "host raid" is generally not too large to manage
effectively.

Vendors like to use buzzwords, and promise big.  You know how to tell
when a lawyer lies; with a vendor, they don't even have to do that.
Log based filesystems and raid are good things.  They are not a
panacea.  Generally speaking, the small systems work best.  The big
ones tend to have scaling problems.

AFS traditionally doesn't "replace" a filesystem, it "augments"
it.  That is, most filesystems do a perfectly fine job of doing
block allocation and such - there's no reason to re-invent the
wheel in AFS.  Filesystems provide a lousy tool to ensure
application data consistency - fsync.  AFS uses this to abandon.
Perhaps it should use it with more scientific abandon to ensure
journalling at the "application" layer and avoid the need to salvage a
volume at boot time.  And then, well, zfs would be pretty nifty for
AFS.  The designers of zfs had scaling to a much larger environment as
a specific goal of theirs.  Zfs has a very small design flaw with fsync
- it doesn't perform as well as it should with respect to ufs.  The zfs
folks admit this and ard hard at work fixing this.

Or, another option would be to replace the host filesystem with a
special one.  An obvious choice today would be reiserfs.  Er, reiserfs
4--application transactions.  Well, maybe tomorrow, maybe.  I think
their design team is hitting some software engineering scaling issues.

Derrick claims nobody wrote a log based filesystem for AFS.  That's
quite true, for AFS 3.  For DFS, there's episode = DCE LFS, a component
of OSF DFS, originally Transarc's AFS 4.  Transarc advertised DFS as
"ahead of its time." They were right.  DFS needed today's machines,
yesterday.  DFS had "scaling issues".

Systems to backup unix filesystems work great, for Unix.  This includes
"unix" network filesystems.  These are all designed for "departmental"
scale computing.  Dozens, or maybe even hundreds of users, the sysadmin
works down the hall.  The backups live in a shoe box.  The list of
files backed up can live in a local machine spreadsheet.  Yup, Unix can
scale up.  It means compromising other features.  Fine grained
filesystem protection semantics aren't possible for a mail server with
50,000 "unix" users.  Backups and restores become "interesting"
problems, with lots of solutions.  Direct attached tape drives become
less attractive, and network shared robotic tape libraries become more
attractive.  These are ways to mitigate scaling issues for Unix
backups, to a point.

AFS has a backup system which is very complicated and different.  Some
ideas are powerful.  Backup clones, that's been copied by some of the
network filesystem appliances now.  Filesets and location
independence.  That's a strength of AFS.  Some ideas are still good.
Tape labels - well, when backups *won't* fit in a shoebox, well, media
labels do help.  Restore by fileset -- wow, administrators don't have
to read user minds (much).  A particular tape robot probably isn't
supported by AFS - but there are hooks, a site can easily script that
support in.  Unfortunately, the AFS designers thought tape was cheaper
than disk.  People have kludges already, and are working for even
better improvements.  The result will be neat, and different.  But it
won't be Unix-like; that would not scale.

Scaling issues.  That's the key thing, in all of the above.  When you
think "enterprise", if you can't name a qualitative difference in the
environment simply because it's gotten too unwieldly to manage with
what you had before, you are not yet thinking big enough.

OpenAFS is a community open source project.  The most valuable
contribution anybody can make to openafs is developer time.  We all owe
a lot to Derrick and many others who have put that time in, but it
behooves all of us to put our own time into improving the things we
care about, and making the best use of his time to put all those pieces
together.  There is no "them" the vendor here.  There's only "us", and
openafs will only be as good as we care to make it.  Kennedy said it
all in the past, before I was even born, but to paraphrase--Ask not what
openafs can do for you--ask what you can do for openafs.  *That* will
scale, to wherever and whatever you and all of us want to do.

				-Marcus Watts
		[ had way too much fun at an SF convention this past weekend ]