[OpenAFS] Overview? Linux filesystem choices

Tom Keiser tkeiser@sinenomine.net
Thu, 30 Sep 2010 14:03:40 -0400

On Thu, Sep 30, 2010 at 12:02 PM, chas williams - CONTRACTOR
<chas@cmf.nrl.navy.mil> wrote:
> On Thu, 30 Sep 2010 14:19:51 +0200
> Stephan Wiesand <stephan.wiesand@desy.de> wrote:
>> Hi Jeff,
>> On Sep 29, 2010, at 22:18 , Jeffrey Altman wrote:
>> > RAID is not a replacement for ZFS. =A0ZRAID-3 protects against single
>> > bit disk corruption errors that RAID cannot. =A0Only ZFS stores a
>> > checksum of the data as part of each block and verifies it before
>> > delivering the data to the application. =A0If the checksum fails and
>> > there are replicas, ZFS will read the data from another copy and
>> > fixup the damaged version. That is what makes ZFS so special and so
>> > valuable. =A0If you have data that must be correct, you want ZFS.
>> you're right, of course. This is a very desirable feature, and the
>> main reason why I'd love to see ZFS become available on linux.
>> I disagree on the "RAID cannot provide this" statement though. RAID-5
>> has the data to detect single bit corruption, and RAID-6 even has the
>> data to correct it. Alas, verifying/correcting data upon read is not
>> a common feature. I know of just one vendor (DDN) actually providing
>> it. It's a mystery to me why the others don't.
>> Anyway, the next best option if ZFS is not available is to run parity
>> checks on all your arrays regularly. Things do get awkward when
>> errors show up, but at least you know. Both Linux MD RAID and the
>> better hardware solutions offer this.
>> From my experience, disks don't do this at random and do not develop
>> such a fault during their life span, but some broken disks do it
>> frequently from the beginning. NB I only ever observed this problem
>> with SATA drives.
> raid5 really isnt quite the same as what jeff is describing about zfs.
> zfs apparently maintains multiple copies of the same block across
> different devices. =A0if you had a single bit error in one of the those
> blocks (as determine by some checksum apparently stored with this
> block), zfs will pick another block that is supposed to contain the
> same data.
> raid5 only corrects single bit errors. =A0it can detect multiple bit
> errors. =A0raid5 (to my knowledge) always verifies, even on reads and can
> correct single bit errors. =A0raid6 can correct two single bit

RAID-5 only provides a single parity bit.  Unfortunately, this means
that it can merely detect a single bit parity error; it cannot correct
the error since there is insufficient information to prove which of
the stripes is in error.  RAID-6 is complicated because different
implementations use different algorithms for the two orthogonal
checksums.  IIRC, all of them are able to detect two-bit errors, and
some of them can correct a single-bit error.