[OpenAFS] CopyOnWrite failed. Workarounds?
Hartmut Reuter
reuter@rzg.mpg.de
Tue, 28 May 2002 16:49:45 +0200
The fact that the CopyOnWrite failure is seen only on servers with very
low load indicates that it has something to do with the filedescriptor
caching. As I mentioned some months ago: we had in MR-AFS a similar
problem where a directory was not REALLYCLOSEd before its unlinking.
Then CopyOnwrite wrote into the unlinked directory instead of the newly
created one because both had the identical UFS-path. The effect was that
after closing the filedescriptor the data were lost and the newly
created file had 0 bytes length.
I tried to produce this effect with an openafs-1.2.3 test-server by any
combination of directory update and "vos backup", but never saw the
problem. It would be nice if someone could give a recipe what steps are
necessary to produce the failure!
Anyway, if you have a production environment where the number of files
opened over a day exceeds by far the number of open filedescriptors you
probably wont see any errors. (not a very helpful hint, I know!)
Hartmut Reuter
Friedrich Delgado Friedrichs wrote:
>
> Hi!
>
> On Sunday i sent a report about my whole home directory becoming orphaned.
> Derrick J Brashear has guessed, that it may be the "CopyOnWrite failure" bug,
> that several people on this list have experienced, however i could not prove this,
> having lost the logfiles.
>
> After reading several of those posts here and on openafs-devel, i am now pretty sure that i
> suffered from the same bug, because apart from the missing logfile entry, the behaviour
> on my box was pretty much the same as reported by Marco Foglia in <3C8F30DB.73BAD2EE@psi.ch>
> and Matthew N. Andrews in <BJEHJHBBLPOFPKCANEMGIEABCAAA.mnandrews@lbl.gov> on the openafs-devel
> list.
>
> Especially since the first report from Marco Foglia dates back to 10/31/2001, and i intend
> to use openafs on my home box and at work in a minor installation, which might serve as
> a testbed for a larger installation lateron, i'd prefer not to wait for a fix, but rather
> like to know which workaround seems viable, especially if you want to have regular backups...
>
> If i fail to resolve this issue, i'd have to decide not to use afs, since regular lossage
> of this dimension is clearly not desirable. ;-)
>
> >From the other posts, i noticed a few points which might be helpful:
> - Don't use backup volumes. Marco Foglia clearly stated that the problem never
> arises when the backup volumes are removed.
> This probably means that in a smaller installation i will take regular dumps
> directly from the home volumes, instead of using the backup volumes. Are there
> any negative side-effects to be expected, i.e. because a mail delivery process
> might hang in iowait for a long time, processes might timeout, etc.?
> - I can't see clearly whether using a single threaded fileserver (is this the
> same as the lwp fileserver?) will help. Several posters hinted in this direction
> but hoffman@cs.pitt.edu in <200203182232.g2IMWil23214@frack.cs.pitt.edu> stated
> that it might make things worse.
>
> What else could help me get around the problem, whilst retaining most of afs's functionality?
>
> Best Regards
> --
> Friedrich Delgado Friedrichs
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
--
-----------------------------------------------------------------
Hartmut Reuter e-mail reuter@rzg.mpg.de
phone +49-89-3299-1328
RZG (Rechenzentrum Garching) fax +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------