[OpenAFS] Overview? Linux filesystem choices

chas williams - CONTRACTOR chas@cmf.nrl.navy.mil
Thu, 30 Sep 2010 12:02:53 -0400


On Thu, 30 Sep 2010 14:19:51 +0200
Stephan Wiesand <stephan.wiesand@desy.de> wrote:

> Hi Jeff,
> 
> On Sep 29, 2010, at 22:18 , Jeffrey Altman wrote:
> 
> > RAID is not a replacement for ZFS.  ZRAID-3 protects against single
> > bit disk corruption errors that RAID cannot.  Only ZFS stores a
> > checksum of the data as part of each block and verifies it before
> > delivering the data to the application.  If the checksum fails and
> > there are replicas, ZFS will read the data from another copy and
> > fixup the damaged version. That is what makes ZFS so special and so
> > valuable.  If you have data that must be correct, you want ZFS.
> 
> 
> you're right, of course. This is a very desirable feature, and the
> main reason why I'd love to see ZFS become available on linux.
> 
> I disagree on the "RAID cannot provide this" statement though. RAID-5
> has the data to detect single bit corruption, and RAID-6 even has the
> data to correct it. Alas, verifying/correcting data upon read is not
> a common feature. I know of just one vendor (DDN) actually providing
> it. It's a mystery to me why the others don't.
> 
> Anyway, the next best option if ZFS is not available is to run parity
> checks on all your arrays regularly. Things do get awkward when
> errors show up, but at least you know. Both Linux MD RAID and the
> better hardware solutions offer this.
> 
> From my experience, disks don't do this at random and do not develop
> such a fault during their life span, but some broken disks do it
> frequently from the beginning. NB I only ever observed this problem
> with SATA drives.

raid5 really isnt quite the same as what jeff is describing about zfs.
zfs apparently maintains multiple copies of the same block across
different devices.  if you had a single bit error in one of the those
blocks (as determine by some checksum apparently stored with this
block), zfs will pick another block that is supposed to contain the
same data.

raid5 only corrects single bit errors.  it can detect multiple bit
errors.  raid5 (to my knowledge) always verifies, even on reads and can
correct single bit errors.  raid6 can correct two single bit
failures (assuming they are on seperate devices).  the only way to 'fix'
a bad block on a raid is to replace the drive.  most raid hardware
doesnt assume that the disk block will get better if you rewrite it.
of course, background verifies of parity are essential to protecting
your data.  media is going to age whether or not you read it.