[OpenAFS] volume corruption: directory references disappear!?!

Fri, 26 Jul 2002 17:08:37 -0700

Derrick J Brashear wrote:

>On Sun, 30 Jun 2002, J. Maynard Gelinas wrote:
>
>>  Derrick thanks for your reply,
>>
>>  Here's what I get in the FileLog on the host which was serving up those
>>volumes:
>>
>>
>>Sun Jun 30 04:00:22 2002 File Server started Sun Jun 30 04:00:22 2002
>>Sun Jun 30 08:17:58 2002 ReallyRead(): read failed device 0 inode 80C2640
>>errno 5
>>Sun Jun 30 08:17:58 2002 ReallyRead(): read failed device 0 inode 80C2640
>>errno 5
>>[...]
>>
>
>>the clone. This seems strange though... if a clone is a set of pointers to
>>the original data, and the original volume became corrupted, how did the
>>cloned data survive?
>>
>
>Presumably "CopyOnWrite" corrupted the parent as it was copying. Still,
>you should probably upgrade to OpenAFS 1.2.5.
>
>
>_______________________________________________
>OpenAFS-info mailing list
>OpenAFS-info@openafs.org
>https://lists.openafs.org/mailman/listinfo/openafs-info
>

We're currently running 1.25 servers and clients on 7.1 and 7.3 redhat 
linux machines.
Volumes on one of our servers is exhibiting very similar behavior.  It 
is a RAID machine
using IDE drives presented to linux as SCSI via 3ware escalade hardware. 
 The machine
has several /vicepx partitions ranging in size from 100GB to 400GB.  The 
following shows
up in our Filelogs:

Fri Jul 26 13:19:24 2002 ReallyRead(): read failed device 1A inode 
1777162977807295 errno 5
Fri Jul 26 13:19:24 2002 ReallyRead(): read failed device 1A inode 
1777162977807295 errno 5
Fri Jul 26 13:19:27 2002 ReallyRead(): read failed device 1A inode 
2503532141879063 errno 5
Fri Jul 26 13:19:27 2002 ReallyRead(): read failed device 1A inode 
2503532141879063 errno 5

application servers accessing afs report a "File too large" type message 
when attempting
to write to various volumes.  An ls reports no files (not even . and ..) 
but sometimes
the files are visible.  In both cases I have manually attempted to cd 
into these volumes and touch a
tempfile and get a message that says "file too large".  So far the only 
solution I've found is to
shutdown the server and run a salvage.  Several times I have had to 
reboot the machine
entirely.  Is there anything else I should look for in order to track 
this down?  I've also noticed
callback failures on a frequent basis on many fileservers but I'm not 
sure if this is related.

-- 
Christopher Arnold
System Administrator
Pictage, Inc.