[OpenAFS] Help needed: Revovering AFS volume after filesystem crash

Berthold Cogel cogel@rrz.uni-koeln.de
Wed, 23 Jun 2004 10:49:44 +0200


Hello!

After a crash (driver problem) some days ago, we had a problem with a 
filesystem on a linux file server. Several volumes have been damaged. I 
was able to recover some of them. But there is one volume that is really 
troublesome.

After fsck the complete volume data were located in lost+found. I moved 
the data back to /vicepag/AFSIDat/l=/lYJ=U/ (VolumeID 537221425) and 
tried to salvage the volume. What I got in the logs was this:

SalvageLog:
@(#) OpenAFS 1.2.8 built  2003-02-11
06/21/2004 16:52:28 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager 
/vicepag 537221425)
06/21/2004 16:52:28 No applicable vice inodes on vicepag; not salvaged
Temporary file /vicepag/salvage.inodes.vicepag.25757 is missing...
^@

I've written a small program that shows me parts of the contents of the 
volume special files. I get this for the Volume Info:

Filename:         zzzz52HK3+0
stamp.magic:      0x78a1b2c5
Typ:              Volume Info

id:               537221425 (0x20055931)
name:             Hv.www.p.ceec.301
inUse:            1 (0x1)
inService:        1 (0x1)
blessed:          1 (0x1)
needsSalvaged:    0 (0x0)
uniquifier:       1994 (0x7ca)
type:             0 (0x0)
parentId:         537221425 (0x20055931)
cloneId:          0 (0x0)
backupId:         0 (0x0)
restoredFromId:   0 (0x0)
needsCallback:    0 (0x0)
destroyMe:        0 (0x0)
dontSalvage:      0 (0x0)
maxquota:         8000000 (0x7a1200)
owner:            0 (0x0)
filecount:        1 (0x1)
diskused:         3157940 (0x302fb4)
creationDate:     1077804455 (0x403dfda7)
updateDate:       1085586052 (0x40b4ba84)
backupDate:       0 (0x0)
copyDate:         1077804455 (0x403dfda7)
stat_initialized: 1 (0x1)

'diskused' seems to be OK. 'filecount' is wrong.
'vos listvol' gives 'Could not attach volume 537221425'.

Large Vnode List and Small Vnode List are ok. All files listed in there 
exist in the data. My program translates the vnode and uniquifier 
information into the flipbase filenames used by AFS in real world linux 
filesystems.

There are two other volumes (empty), whose structure seems to be OK. But 
the salvager returns errors like this:

SalvageLog:
@(#) OpenAFS 1.2.8 built  2003-02-11
06/22/2004 15:43:19 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager 
/vicepae 537221329)
06/22/2004 15:43:19 SALVAGING VOLUME 537221329.
06/22/2004 15:43:19 Hv.www.p.ceec.117 (537221329) updated 03/24/2004 14:16
06/22/2004 15:43:19 Salvaged Hv.www.p.ceec.117 (537221329): 0 files, 0 
blocks
^@

'vos listvol' gives:
Hv.www.p.ceec.117                 537221329 RW          0 K On-line

'ls -l' for the mount point gives: '117: No such file or directory'

I was able to recover some of the other volumes hit by the crash just by 
copying the data from lost+found and salvaging the volume. In some cases 
I had to restore parts of the path. Some of the volumes were empty. But 
I restored them just for training. We don't have a backup of the data 
(image database, different resolutions, created automatically out of raw 
data) at the moment because of the amount of data. With a little time 
(within a month), the data can be reprocessed, so recovery is not really 
necessary. But I'm curious.

Any suggestions, how to solve this puzzle?


Regards,

Berthold Cogel