AW: [OpenAFS] Help!!! Files / Volumes are disappearing

Rubino Geiß kb44@rz.uni-karlsruhe.de
Sun, 16 Jun 2002 16:35:10 +0200


Hi!

Uhhh, this is not exactly what i wanted to hear... we are suffering form
an konwn but uncureable desaese :-0

I'm willing to provide you envery information you need to hunt down the
error. Centainly I'm not willing to have this happen one again. Eveytime
it happens i get some more gray hair... What do you need?

Can you define low-usage. We got 3 servers and 2 of these incidents
occured on the smallest won... 6x4 GB volumes. Or do you mean vnode
references per day. Or just the loadavarage. In the least to cases shall
i fake load oder use find / grep to put workload on the server... how
can this be a cure?


Toens wanted to know:
> What filesystem do you use for your vicepX partitions?

ext3, mounted whith -O data=writeback

shall i use data=orderd


Friedrich Delgado Friedrichs wrote:
> Have you got backup volumes for the volumes in question?

> This is the old and evil CopyOnWrite failure bug, i assume. I
experienced 
> this as well. Currently there is no fix (afaik) but a reiliable
> workaround seems to be to remove the backup volume for each affected
> volume and instead of dumping the backup volumes, dump the volumes 
> directly.

yes, i do. they are created every day at 01:00, tape dumps from them
were done, according to the a scheudule, at 02:00.

I dont like this idea. Since we are a compiler construction and software
engineering intitute. Many programmers work at nigth or have long runnig
jobs. So any systemdowntime is not apreciated. A direct tape dump will
cause such a downtime (volume lock)...


Thank you all, 
ciao, ruby

-----Ursprüngliche Nachricht-----
Von: Derrick J Brashear [mailto:shadow@dementia.org] 
Gesendet: Sonntag, 16. Juni 2002 09:10
An: Rubino Geiß
Cc: 'Rubino Geiß'
Betreff: Re: [OpenAFS] Help!!! Files / Volumes are disappearing


On Fri, 14 Jun 2002, [iso-8859-1] Rubino Geiß wrote:

> Hi all,
>
> we've got a serious problem here. Whole directories are disappearing. 
> Even a restore from a tape backup is not working properly -- the 
> internal afs storage structure seems to be corrupted, such that a 
> restore is reproducing the same kind of error!

The volume dump is corrupted, you need to restore an older dump. You
have the CopyOnWrite bug. We haven't been able to track this yet, if you
see it regularly we would massively appreciate if you'd be willing to
provide us some debug info. Let me know if you do and if you are and I
will provide details.

ruby> I have been using old volume dumps, that saved my life. 

> Your help and suggestions are very welcome, as many of our institute 
> are very concerned about this issues. They even suggested moving back 
> to NFS, because AFS seems not to be ready for a production 
> environment!?

It's a bug experienced only by low-usage sites. So, this is a catch-22.
Use it more heavily and it stops manidfesting itserlf