[OpenAFS-devel] CopyOnWrite failure

Marco Foglia marco.foglia@psi.ch
Wed, 13 Mar 2002 11:58:35 +0100


Hi,

here are some additional information from our site about this
problem. 

Derrick J Brashear wrote:
> 
> Some questions for those of you with this problem:
> -always with non-replicated volumes that have a .backup?

Yes. It never happens if you remove the .backup volume.

> -if the above, was the backup being recreated at the time? (the VolserLog
>  may be helpful here, as well as the vos examine info)

We recreate the backup volumes around midnight but the 
CopyOnWrite failure "happens" when users log in in the morning.
BUT, there were cases when the volume was already corrupt before
we saw the "CopyOnWrite failure" in the file server log (the
last backup of these volumes was already corrupt).  

> -what if anything pertinent about access patterns?

We have one Linux file server (300 GB, 550 volumes) which 
does not have the CopyOnWrite bug! I tried to clone this server 
by using exactly the same hardware and doing a rsync but the
cloned file server suffers from the bug. So I don't think that 
a special access pattern (or afs client version) is 
responsible for it. The likeliness is just higher if you 
are a heavy user. The only difference between our "stable" and 
any other file server (="unstable") is that the "stable" 
is 60% full and the "unstables" are more or less empty. 
Could the timing of some file system functions be different 
and therefore trigger the bug?

Marco 

--
Marco Foglia | Paul Scherrer Institut  | phone     +41 56 310 36 39 
             | CH-5232 Villigen        | mailto:marco.foglia@psi.ch
-------------------------------------------------------------------