[OpenAFS] Re: Timeouts and odd behavior with 1.6.0 file servers

Andrew Deason adeason@sinenomine.net
Thu, 26 Jan 2012 10:13:53 -0600


On Thu, 26 Jan 2012 09:35:10 -0500
Jeff White <jaw171@pitt.edu> wrote:

> Russ, can you link me to some more information on the data corruption 
> issue with 1.6.0?

There are 3 issues, but one is way more serious than the other two.
There is a race introduced with the positional i/o feature that can
cause the fileserver to write to the incorrect file descriptor during
pretty much any operation (you can disable pio at configure time). There
is another issue where if you rename() a directory to another directory
inside an RW that has a clone (the most common type of clones are backup
volumes or RO volumes on the same partition), the .. entry may be
updated in the clone instead of the RW.

The issue that has seen the most aggravation is where a write to a file
in an RW volume with a clone can experience corruption under certain I/O
patterns, due to a flaw in the accelerated copy-on-write routine. This
appears to happen with database-like accesses (sqlite, microsoft office
files). The bug for this is 130295, if you want to look.

-- 
Andrew Deason
adeason@sinenomine.net