[OpenAFS-devel] CopyOnWrite failures continue still

hoffman@cs.pitt.edu hoffman@cs.pitt.edu
Thu, 28 Mar 2002 12:37:54 -0500 (EST)


We've had two more corrupted volumes, but only one of which had a
CopyOnWrite error.  I ran volinfo before and after salvaging
the volumes each time.  All pertinent logs are in

	ftp://ftp.cs.pitt.edu/hoffman/openafs

The first corrupted volume, cs.usr0.naveen, 536878347, exhibited this
behavior:

    % ls ~naveen
    % ls -a ~naveen
    %

i.e., it appeared that there was nothing there, not even . or
.. directories, but the volume did NOT go offline.  'vos examine' showed
nothing unusual, and the backup volume was OK.  There was nothing
in the FileLog or VolserLog that mentioned anything about that volume.
There were, however, numerous lines like this:

    Sun Mar 24 14:49:51 2002 ReallyRead(): read failed device 2 inode 80AA9E0 errno 5

Errno 5 is "I/O error".  I checked all of the other system logs and there
were no hardware errors.  I did a 'dd of=/dev/null' on all of the disk
partitions and no errors were reported.  These are RAID-5 fileservers anyway,
so there should never be ANY errors unless two drives fail!  How can I
figure out what's really going on here?
 

----------------------

The second corrupted volume, cs.usr0.mbell, 536877058, exhibited the
same behavior as all of the other CopyOnWrite failures.  A log of
what I did this morning is in ftp://ftp.cs.pitt.edu/hoffman/openafs/mbell.log.

The affected fileserver is running RedHat 7.2, kernel 2.4.9-21 and
OpenAFS 1.2.3, non-threaded fileserver with the ihandle.c patch.

What should I try next?

	---Bob.