[OpenAFS] Should I replace a hard drive with known bad blocks?

Fri, 08 Sep 2006 13:44:23 -0400

In message <4501A732.2020509@gnosysllc.com>,Kevin writes:
>I realize that various filesystem tools (ext2, ext3, etc.) have utilities to
>map bad blocks and avoid having the system use them, but is it a good rule
>of thumb that a HDD with bad blocks is failing?  ie, that finding bad blocks
>is an indicator that the HDD will soon fail catastrophically?

generally speaking most disk drives automatically map out bad blocks
(this can be disabled by vendors for various reasons mostly related
to performance).  there is usually a small number of blocks set aside
for this purpose.  once you run out of these spare blocks are going to
be in trouble.  it might help to know how old the disk is, and the
controller technology (scsi or ide).  modern ide drives usually have
SMART and you can query this with smartctl on a linux system.  i dont
remember what scsi has exactly, but the mode pages can tell you if
bad block mapping is enabled.

>I'm considering installing an OpenAFS server on a machine with such a hard
>drive.  I've done about 20-40 passes on the partitions searching for bad
>blocks, and I do find them, but the number remains the same on each pass.

is the same blocks every time?  is this search a read or a verify?

>So the question is one of judgment.  Do list members think it would be
>advisable to replace a hard drive at the first indication that there are bad
>blocks (in anticipation of it failing soon)? 

yes.

>If that is overkill, is it a
>bad idea to use a hard drive in production use where data integrity is
>important and the hard drive is known to have bad blocks? 

yes.  particularly if its not just your data.

>Or is it
>perfectly safe if some precautions are followed (such as scanning for bad
>blocks periodically henceforth)? 

buy a new drive.  they are getting quite cheap.

Or other?

make backups.  use a raid1 mirror.