[OpenAFS] volume corruption: directory references disappear!?!
Christopher Arnold
chris@pictage.com
Fri, 26 Jul 2002 17:08:37 -0700
Derrick J Brashear wrote:
>On Sun, 30 Jun 2002, J. Maynard Gelinas wrote:
>
>> Derrick thanks for your reply,
>>
>> Here's what I get in the FileLog on the host which was serving up those
>>volumes:
>>
>>
>>Sun Jun 30 04:00:22 2002 File Server started Sun Jun 30 04:00:22 2002
>>Sun Jun 30 08:17:58 2002 ReallyRead(): read failed device 0 inode 80C2640
>>errno 5
>>Sun Jun 30 08:17:58 2002 ReallyRead(): read failed device 0 inode 80C2640
>>errno 5
>>[...]
>>
>
>>the clone. This seems strange though... if a clone is a set of pointers to
>>the original data, and the original volume became corrupted, how did the
>>cloned data survive?
>>
>
>Presumably "CopyOnWrite" corrupted the parent as it was copying. Still,
>you should probably upgrade to OpenAFS 1.2.5.
>
>
>_______________________________________________
>OpenAFS-info mailing list
>OpenAFS-info@openafs.org
>https://lists.openafs.org/mailman/listinfo/openafs-info
>
We're currently running 1.25 servers and clients on 7.1 and 7.3 redhat
linux machines.
Volumes on one of our servers is exhibiting very similar behavior. It
is a RAID machine
using IDE drives presented to linux as SCSI via 3ware escalade hardware.
The machine
has several /vicepx partitions ranging in size from 100GB to 400GB. The
following shows
up in our Filelogs:
Fri Jul 26 13:19:24 2002 ReallyRead(): read failed device 1A inode
1777162977807295 errno 5
Fri Jul 26 13:19:24 2002 ReallyRead(): read failed device 1A inode
1777162977807295 errno 5
Fri Jul 26 13:19:27 2002 ReallyRead(): read failed device 1A inode
2503532141879063 errno 5
Fri Jul 26 13:19:27 2002 ReallyRead(): read failed device 1A inode
2503532141879063 errno 5
application servers accessing afs report a "File too large" type message
when attempting
to write to various volumes. An ls reports no files (not even . and ..)
but sometimes
the files are visible. In both cases I have manually attempted to cd
into these volumes and touch a
tempfile and get a message that says "file too large". So far the only
solution I've found is to
shutdown the server and run a salvage. Several times I have had to
reboot the machine
entirely. Is there anything else I should look for in order to track
this down? I've also noticed
callback failures on a frequent basis on many fileservers but I'm not
sure if this is related.
--
Christopher Arnold
System Administrator
Pictage, Inc.