[OpenAFS] Help needed: Revovering AFS volume after filesystem crash
Berthold Cogel
cogel@rrz.uni-koeln.de
Wed, 23 Jun 2004 10:49:44 +0200
Hello!
After a crash (driver problem) some days ago, we had a problem with a
filesystem on a linux file server. Several volumes have been damaged. I
was able to recover some of them. But there is one volume that is really
troublesome.
After fsck the complete volume data were located in lost+found. I moved
the data back to /vicepag/AFSIDat/l=/lYJ=U/ (VolumeID 537221425) and
tried to salvage the volume. What I got in the logs was this:
SalvageLog:
@(#) OpenAFS 1.2.8 built 2003-02-11
06/21/2004 16:52:28 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager
/vicepag 537221425)
06/21/2004 16:52:28 No applicable vice inodes on vicepag; not salvaged
Temporary file /vicepag/salvage.inodes.vicepag.25757 is missing...
^@
I've written a small program that shows me parts of the contents of the
volume special files. I get this for the Volume Info:
Filename: zzzz52HK3+0
stamp.magic: 0x78a1b2c5
Typ: Volume Info
id: 537221425 (0x20055931)
name: Hv.www.p.ceec.301
inUse: 1 (0x1)
inService: 1 (0x1)
blessed: 1 (0x1)
needsSalvaged: 0 (0x0)
uniquifier: 1994 (0x7ca)
type: 0 (0x0)
parentId: 537221425 (0x20055931)
cloneId: 0 (0x0)
backupId: 0 (0x0)
restoredFromId: 0 (0x0)
needsCallback: 0 (0x0)
destroyMe: 0 (0x0)
dontSalvage: 0 (0x0)
maxquota: 8000000 (0x7a1200)
owner: 0 (0x0)
filecount: 1 (0x1)
diskused: 3157940 (0x302fb4)
creationDate: 1077804455 (0x403dfda7)
updateDate: 1085586052 (0x40b4ba84)
backupDate: 0 (0x0)
copyDate: 1077804455 (0x403dfda7)
stat_initialized: 1 (0x1)
'diskused' seems to be OK. 'filecount' is wrong.
'vos listvol' gives 'Could not attach volume 537221425'.
Large Vnode List and Small Vnode List are ok. All files listed in there
exist in the data. My program translates the vnode and uniquifier
information into the flipbase filenames used by AFS in real world linux
filesystems.
There are two other volumes (empty), whose structure seems to be OK. But
the salvager returns errors like this:
SalvageLog:
@(#) OpenAFS 1.2.8 built 2003-02-11
06/22/2004 15:43:19 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager
/vicepae 537221329)
06/22/2004 15:43:19 SALVAGING VOLUME 537221329.
06/22/2004 15:43:19 Hv.www.p.ceec.117 (537221329) updated 03/24/2004 14:16
06/22/2004 15:43:19 Salvaged Hv.www.p.ceec.117 (537221329): 0 files, 0
blocks
^@
'vos listvol' gives:
Hv.www.p.ceec.117 537221329 RW 0 K On-line
'ls -l' for the mount point gives: '117: No such file or directory'
I was able to recover some of the other volumes hit by the crash just by
copying the data from lost+found and salvaging the volume. In some cases
I had to restore parts of the path. Some of the volumes were empty. But
I restored them just for training. We don't have a backup of the data
(image database, different resolutions, created automatically out of raw
data) at the moment because of the amount of data. With a little time
(within a month), the data can be reprocessed, so recovery is not really
necessary. But I'm curious.
Any suggestions, how to solve this puzzle?
Regards,
Berthold Cogel