[OpenAFS] CopyOnWrite failed

matt@njit.edu matt@njit.edu
Tue, 25 Jun 2002 23:18:30 -0400


Greetings, I have been following the discussions on the CoW problem on
Linux fileservers.  I believe i have had my first occurrence of "The
CoW"...

Sorry for the long post, but i wanted to include the logs... 

This server is OpenAFS 1.1.1 in a cluster of 3 servers in a test cell,
All RedHat Linux 7.1 ... (Yea i know, its an old OpenAFS version, but
its been working like a champ in my test env), All vicep slices are on
regular ext2 fs's.  ...  

Server in question is: Linux roux.automagically.net 2.4.3-12 #1 Fri Jun
8 15:05:56 EDT 2001 i686 unknown

Questions:

Is this the CoW problem that still exists in later versions?

Volume user_matt seemed to recover, and the removed files appear to be
in a browser cache. So no big loss. Should i still restore this volume
from backups? Is the volume data still corrupted in some invisible way?

I have read that backup clones are related in some way to the CoW. 
Should i switch to periodically released r/o volumes instead? (ie:
explicitly mounted r/w home dirs etc...)?

All my other OpenAFS cells are on more recent OpenAFS versions on
Solaris8, These cells are safe from the CoW, correct?

I have incremental vos dumps of all volumes going back about a month,
Which seem to all be ok. (Test restore in progress)  



See logs below.

=========================================
FileLog:
Tue Jun 25 06:30:03 2002 ReallyRead(): read failed device 0 inode
80BF100 errno 5
Tue Jun 25 06:30:03 2002 ReallyRead(): read failed device 0 inode
80BF100 errno 5
Tue Jun 25 06:30:12 2002 ReallyRead(): read failed device 0 inode
80BF100 errno 5
Tue Jun 25 06:30:12 2002 ReallyRead(): read failed device 0 inode
80BF100 errno 5
=== Many many omitted ===
Tue Jun 25 20:33:31 2002 ReallyRead(): read failed device 0 inode
80BF100 errno 5
Tue Jun 25 20:33:44 2002 ReallyRead(): read failed device 0 inode
80BF100 errno 5
Tue Jun 25 20:34:17 2002 CopyOnWrite failed: volume 536870951 in
partition /vicepa needs salvage
Tue Jun 25 20:34:56 2002 VAttachVolume: volume salvage flag is ON for
/vicepa//V0536870951.vol; volume needs salvage
Tue Jun 25 21:00:33 2002 VAttachVolume: Error reading diskDataHandle vol
header /vicepa//V0536870953.vol; error=101
Tue Jun 25 21:00:34 2002 VAttachVolume: Error attaching volume
/vicepa//V0536870953.vol; volume needs salvage; error=101
==========================================

I then initiated salvage, SalvageLog contains:

@(#) OpenAFS 1.1.1 built  2001-08-22
06/25/2002 20:41:38 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager
/vicepa 536870951)
06/25/2002 20:41:50 CHECKING CLONED VOLUME 536870953.
06/25/2002 20:41:50 user_matt.backup (536870953) updated 06/25/2002
20:33
06/25/2002 20:41:51 Vnode 271: length incorrect; (is 4096 should be 0)
06/25/2002 20:41:51 SALVAGING VOLUME 536870951.
06/25/2002 20:41:51 user_matt (536870951) updated 06/25/2002 20:33
06/25/2002 20:41:51 Vnode 106: version < inode version; fixed (old
status)
06/25/2002 20:41:51 Vnode 1112: version < inode version; fixed (old
status)
06/25/2002 20:41:51 Vnode 3150: version < inode version; fixed (old
status)
06/25/2002 20:41:51 Vnode 3392: version < inode version; fixed (old
status)
06/25/2002 20:41:51 Vnode 3550: version < inode version; fixed (old
status)
06/25/2002 20:41:51 Vnode 3984: version < inode version; fixed (old
status)
06/25/2002 20:41:51 Vnode 3992: version < inode version; fixed (old
status)
(Similar omitted)
06/25/2002 20:41:51 Vnode 167538: version < inode version; fixed (old
status)
06/25/2002 20:41:51 Vnode 271: length incorrect; changed from 4096 to 0
06/25/2002 20:41:52 Vnode 939: length incorrect; changed from 32768 to 0
06/25/2002 20:41:52 Vnode 941: length incorrect; changed from 2048 to 0
06/25/2002 20:41:52 Vnode 3139: length incorrect; changed from 6144 to 0
06/25/2002 20:41:52 Vnode 3147: length incorrect; changed from 6144 to 0
(Similar omitted)
06/25/2002 21:00:18 dir vnode 1: special old unlink-while-referenced
file .__afs5816 is deleted (vnode 3486)
06/25/2002 21:00:18 First page in directory does not exist.
06/25/2002 21:00:18 Directory bad, vnode 271; salvaging...
06/25/2002 21:00:18 Salvaging directory 271...
06/25/2002 21:00:18 Failed to read first page of fromDir!
06/25/2002 21:00:18 Checking the results of the directory salvage...
06/25/2002 21:00:20 First page in directory does not exist.
06/25/2002 21:00:20 Directory bad, vnode 939; salvaging...
06/25/2002 21:00:20 Salvaging directory 939...
06/25/2002 21:00:20 Failed to read first page of fromDir!
06/25/2002 21:00:20 Checking the results of the directory salvage...
06/25/2002 21:00:20 First page in directory does not exist.
06/25/2002 21:00:20 Directory bad, vnode 941; salvaging...
06/25/2002 21:00:20 Salvaging directory 941...
06/25/2002 21:00:20 Failed to read first page of fromDir!
06/25/2002 21:00:20 Checking the results of the directory salvage...
06/25/2002 21:00:21 dir vnode 1317: special old unlink-while-referenced
file .__afsDED9 is deleted (vnode 10754)
06/25/2002 21:00:21 dir vnode 1317: special old unlink-while-referenced
file .__afsB1E3 is deleted (vnode 10968)
06/25/2002 21:00:21 dir vnode 1317: special old unlink-while-referenced
file .__afsEB88 is deleted (vnode 10726)
06/25/2002 21:00:21 dir vnode 1317: special old unlink-while-referenced
file .__afsBA49 is deleted (vnode 10694)
06/25/2002 21:00:21 dir vnode 1317: special old unlink-while-referenced
file .__afs8EE6 is deleted (vnode 10016)
06/25/2002 21:00:21 dir vnode 1317: special old unlink-while-referenced
file .__afsF167 is deleted (vnode 10764)
06/25/2002 21:00:21 dir vnode 1317: special old unlink-while-referenced
file .__afsC15D is deleted (vnode 10734)
06/25/2002 21:00:21 dir vnode 1317: special old unlink-while-referenced
file .__afs3E9C is deleted (vnode 10758)
06/25/2002 21:00:21 dir vnode 1317: special old unlink-while-referenced
file .__afs9CEE is deleted (vnode 10888)
06/25/2002 21:00:21 dir vnode 1317: special old unlink-while-referenced
file .__afs65F8 is deleted (vnode 9990)
06/25/2002 21:00:21 dir vnode 1317: special old unlink-while-referenced
file .__afs565E is deleted (vnode 10702)
06/25/2002 21:00:21 dir vnode 1317: special old unlink-while-referenced
file .__afs2C01 is deleted (vnode 10712)
06/25/2002 21:00:21 dir vnode 1317: special old unlink-while-referenced
file .__afs76DB is deleted (vnode 10728)
06/25/2002 21:00:21 dir vnode 1317: special old unlink-while-referenced
file .__afs47D9 is deleted (vnode 10880)
06/25/2002 21:00:23 First page in directory does not exist.
06/25/2002 21:00:23 Directory bad, vnode 3139; salvaging...
06/25/2002 21:00:23 Salvaging directory 3139...
06/25/2002 21:00:23 Failed to read first page of fromDir!
06/25/2002 21:00:23 Checking the results of the directory salvage...
06/25/2002 21:00:23 First page in directory does not exist.
(Similar omitted)
06/25/2002 21:00:26 Salvaging directory 3211...
06/25/2002 21:00:26 Failed to read first page of fromDir!
06/25/2002 21:00:26 Checking the results of the directory salvage...
06/25/2002 21:00:32 Vnode 271: link count incorrect (was 23, now 2)
06/25/2002 21:00:33 Found 3578 orphaned files and directories (approx.
57612 KB)
06/25/2002 21:00:33 Salvaged user_matt (536870951): 88736 files, 9738570
blocks

=================================================================


-- 
# Matthew E. Hoskins 
########################################################
# Information Systems Analyst                        /|   /      /  / 
~~/~~ #
# University Information Systems                    / |  /      /  /   
/    #
# New Jersey Institute of Technology               /  | /  /   /  /   
/     #
# University Heights, Newark, New Jersey 07102    /   |/  /___/  /   
/      #
# Ph: 973-596-5202         #  Beep:
pagematt@cobol.njit.edu                  #
# Rm: 5400 GITC Building   # Email:
matt@njit.edu                            #
##############################################################################
"Any technology sufficiently advanced is indistinguishable from a perl
script"
"Anyone who considers arithmetical methods of producing random digits
is, 
        or course, in a state of sin -- John von Neumann"