[OpenAFS-devel] CopyOnWrite failure: ihandle.c Patch: Code Cleanup

Srikanth Vishwanathan vsrikanth@in.ibm.com
Fri, 5 Apr 2002 10:54:27 -0500


Hi Marco - Hi Bob -

I was not too hopeful that the problem would go away with the new patch.
Anyway, thanks for the information and the logs. We'll keep looking and
let you guys know if we find something.

Thanks,

Srikanth.



                                                                                                                   
                    hoffman@cs.pit                                                                                 
                    t.edu                To:     openafs-devel@openafs.org                                         
                                         cc:     Srikanth Vishwanathan/India/IBM@IBMIN, marco.foglia@psi.ch        
                    04/05/2002           Subject:     Re: [OpenAFS-devel] CopyOnWrite failure: ihandle.c Patch:    
                    10:23 AM              Code Cleanup                                                             
                    Please respond                                                                                 
                    to hoffman                                                                                     
                                                                                                                   
                                                                                                                   



> unfortunately the ihandle patch does not solve the CopyOnWrite
> problem. We had another CopyOnWrite failure this morning ...

We had one last night, 51 seconds after the backup volume was recloned:

VolserLog:
Thu Apr  4 15:49:58 2002 1 Volser: Clone: Recloning volume 536874136 to
volume 536874138

FileLog:
Thu Apr  4 15:50:49 2002 CopyOnWrite failed: volume 536874136 in partition
/vicepb  (tried reading 4096, read 0, wrote 0, errno 11) volume needs
salvage

> Because only the .kde and .netscape directories we corrupted
> this time I did not restore the whole volume from backup but
> did a "salvager ... -orphan attach" twice. When I moved the
> volume away from this file server afterwards I got lots of
>
> Fri Apr  5 13:38:20 2002 ReallyRead(): read failed device 0 inode
> 80BD460 errno 5
>
> in the file server log.

We didn't get any of those.  I was able to get a volinfo snapshot of
the corrupted volume before and after salvaging.  The files are:

ftp://ftp.cs.pitt.edu/hoffman/openafs/SalvageLog.536874136
ftp://ftp.cs.pitt.edu/hoffman/openafs/volinfo.536874136
ftp://ftp.cs.pitt.edu/hoffman/openafs/volinfo.536874136.after_salvage

The corrupted vnode (2945) was not the top-level vnode this time.
It was, however, under heavy use.

           ---Bob.
--
Bob Hoffman, N3CVL      University of Pittsburgh           Tel: +1 412 624
8404
hoffman@cs.pitt.edu     Department of Computer Science     Fax: +1 412 624
8854