[OpenAFS] salvage removed .6M files!
Mike Polek
mike@pictage.com
Mon, 01 Aug 2005 19:10:46 -0700
Hi, Steve,
Did you do an fsck on the hard drive (or whatever the
SUN equivalent is these days)? I had a similar problem
recently where a system lost power. It started up ok
and recovered using the ext3 journal, but my data was missing
after the salvage. After a few salvage attempts, my data
was still missing. I stopped the AFS fileserver, unmounted
the partitions, used fsck to check them all manually, and
sure enough the partition that was flaking had errors.
Once I cleaned that up, I salvaged again, and voila!...
my data reappeared.
I recommend checking the underlying filesystem for
errors. It may be too late if you've already started
restoring data to the partition... but perhaps for
future reference.
OS: RedHat 9
Kernel: 2.4.30
AFS: 1.2.13
Mike Polek
Pictage, Inc.
> ---- Original Message ----
> From: rader
More information, fwiw...
- SalvageLog.old indicates (the initial) salvaging started
at 01:07:43
- BosLog indicates that that salvage exited with signal 15 at
05:00:38
- SalvageLog indicates another salvage--the one that went
awry--started at 05:00:38 and completed 06:44:41
- bos getrestart reports the server should restart for
new binaries at "5:00 am"
It is possible the "restart for new binaries" erroneously happened,
and it kill -SIGTERM'ed the bos salvage which left the volume
in an inconsistent state that caused the subsequent salvage to
blow chunks?? (I'm under the general impression that interrupting
salvages is a bad idea.)
At any rate, I've turned off the "restarts for new binaries at
5:00 am" thing.
steve
- - -
systems & network manager
high energy physics
university of wisconsin
> ---- Original Message ----
> From: rader
>
> One of our servers (Solaris7 inode fileserver running 1.2.11) lost
> power this morning and the resulting bos salvage on a large (50 GB)
> volume removed about 600,000 files.... /usr/afs/logs/SalvageLog
> reads, for example...
>
> 07/29/2005 06:19:26 dir vnode 87953: invalid entry: \
> ./cmsprod/cern/setup.sh (vnode 2258102, unique 14499243)
> 07/29/2005 06:19:26 dir vnode 87953: ./cmsprod/cern/setup.sh \
> (vnode 2258102): unique changed from 14499243 to 0 -- deleted
>
> Does anybody have any suggestions about how to recover the lost
> files?? (I'm restoring from tape now, but I'll still have the
> busted volume around when I'm done.)
>
> steve
> - - -
> systems & network manager
> high energy physics
> university of wisconsin