[OpenAFS] can't get volumes online

Stephan Wiesand Stephan.Wiesand@desy.de
Sun, 20 Feb 2005 13:08:12 +0100 (CET)


Thanks for your answer. Lots of additional information inline:

On Wed, 16 Feb 2005, Jeffrey Hutzelman wrote:

>
>
> On Friday, February 11, 2005 07:45:12 PM +0100 Stephan Wiesand 
> <Stephan.Wiesand@desy.de> wrote:
>
>> For some 25 volumes, the salvager complained about problems with the
>> header structure and renamed them to "bogus.<numeric ID> and left them
>> offline:
>> 
>>   ...
>>   Salvaged bogus.536883946 (536883946): 449 files, 1000045 blocks
>> 
>> We tried dumping and restoring those to different volumes: They're still
>> offline. We tried running the salvager on the new volumes again, but
>> 
>>   STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager -part /vicepd -volumeid
>>   536883946 -showlog)
>>   SALVAGING VOLUME 536883946.
>>   xxx.yyy.zzz (536883946) not updated (created 02/11/2005 18:35)
>>   Salvaged xxx.yyy.zzz (536883946): 449 files, 1000045 blocks
>> 
>> and the volume's still offline.
>> 
>> Any ideas? Or do we have to assume that these volumes were corrupted to
>> the point where recovery is completely impossible?
>
>
> It would help if you identified the platform and AFS version you're using.
> Note that quoting "STARTING AFS SALVAGER 2.4" does not help -- that version 
> string has said 2.4 at least since AFS 3.1, and still says the same thing on 
> the OpenAFS CVS head today.

Right, sorry. This is Linux, SuSE 8.2, with a vanilla 2.4.29 kernel, and 
openafs 1.2.13 configured with --enable-fast-restart --enable-bitmap-later.
Everything is compiled with gcc 2.95.3. We run the LWP fileserver.
BosConfig looks like this:

  parm /usr/afs/bin/fileserver -syslog
  parm /usr/afs/bin/volserver -syslog
  parm /usr/afs/bin/salvager -DontSalvage

The server has four vice partitions, each of size 1.1 TB or 1.3 TB,
ext3 filesystem mounted with noatime,data=writeback. After the crash,
ext3 replayed the journals and was happy with the filesystems, as was
a full fsck of the affected partitions.

One fact that may be important is that the server was frozen by a failed
hardware probe (in an attempt to collect asset data) about one week 
before the SCSI problem. All volumes on it were attached cleanly after
a reset, but the Salvager hadn't run.

> When you say the volume is offline, I assume you are basing this on the 
> output you see in 'vos listvol' or 'vos examine'.  One of the ways this can

Yes.

> happen is if there is another copy of the same volume (by ID) on a 
> lower-numbered partition on the same server.  Have you checked that this 
> volume does not appear on /vicepa, /vicepb, or /vicepc?  Is the volume

I hadn't, but now I have, and there are no such volumes on the other 
partitions.

> offline even when you restore it to a different server?

Yes.

> Just as an additional check, does that volume (by number) actually appear in 
> the VLDB?  What output do you get from 'vos listvldb 536883946' ?

It's all fine: vos listvldb, vos listvol and vos examine all list
the affected volumes properly, except that the latter two commands 
return it's "new name" bogus.536883946.

> If the offline-ness survives a dump and restore to a different server, then 
> it is likely based on some persistent state which is recorded in a volume 
> dump.  If this is the case, you may be able to get some useful information by 
> looking at a volume dump of one of these volumes.
>
> Grab a copy of my volume dump tools from 
> /afs/cs.cmu.edu/project/systems-jhutz/dumpscan.
>
> Do a dump of one of the offline volumes, and then run
>
> afsdump_scan -PV <dump_file>

Yours won't run on my test system, but I grabbed the one from a recent
1.3.78 build (anyway, most of these volumes are larger than 2GB):

* VOLUME HEADER [42 = 0x000000000000002a]
  Volume ID:   536883943
  Version:     1
  Volume name: bogus.536883943
  In service?  true
  Blessed?     false
  Uniquifier:  2580
  Type:        0
  Parent ID:   536883943
  Clone ID:    0
  Max quota:   0
  Min quota:   0
  Disk used:   2535835
  File count:  1093
  Account:     0
  Owner:       0
  Created:     Fri Feb 11 17:35:53 2005
  Accessed:    Thu Jan  1 01:00:00 1970
  Updated:     Thu Jan  1 01:00:00 1970
  Expires:     Thu Jan  1 01:00:00 1970
  Backed up:   Thu Jan  1 01:00:00 1970
  Offine Msg:
  MOTD:
  Weekuse:              0          0          0          0
  Weekuse:              0          0          0
  Dayuse Date: Thu Jan  1 01:00:00 1970
  Daily usage: 0

> The output contains all of the volume-level information that is recorded in 
> the volume dump, none of which should be particularly sensitive.  Send a copy 
> of that output (it's not very long), and perhaps someone can comment on 
> what's wrong.

Looks good to me, except for the quota. What all these volumes have in 
common is that "InService" is true, but "Blessed" is False. I spent the 
better part of last week digging in the vice partitions, and found out in
addition:

o One volume has a completely corrupted info file (the size is 552 bytes
   as it should, but the magic number is wrong and there is no data making
   sense in any field) and only the single vnode #1 (and decoding the
   directory reveals that the only entries are "." and ".."). The Salvager
   reports a failed assertion whenever it touches this one. The large vnode
   index looks ok (#1 is the only vnode with a good magic). The small vnode
   index has a size of 8 bytes... Modification times of the small/large
   vnode index and the root directory +/+/\=++++2 are December 10.
   Modification time of the info file is one minute before the SCSI
   failure. I think this volume never had any content at all.

o All other affected volumes seem essentially intact, except that they
   have "DontSalvage" set to 229 in their headers. One has a large
   number of vnode files for which I don't find any directory entry,
   but otherwise looks ok. The others just look ok, where by "ok" I mean:

   - For each and every vnode file in the volume's directory on the vice
   partition, there is exactly one entry making perfect sense in a
   directory vnode file.

   - For each and every vnode file in the volume's directory there's a
   matching entry making perfect sense (good ACLs, owner, group, ...) in
   the small/large vnode index.

   - There are no directory entries or entries in the vnode indexes without
   a matching vnode file.

In some cases, modification times of files/directories in the vice 
partition prove that these volumes had not been written to for a month or 
longer. Weekly restarts are in effect (though the genereal belief seems
to be that they're redundant nowadays).

I haven't checked the link tables yet. Since these volumes have no backup 
or readonly clones, I don't expect any surprises there.

I was able to restore the full contents and metadata from all volumes - 
except the two that are actually damaged - by copying files off the vice 
partition into newly created volumes.

So if I could verify the volumes' integrity and make the data available to 
my users, why do the salvager and file/volserver refuse to do so ?

Cheers,

-- 

  ----------------------------------------------------
| Stephan Wiesand  |                                |
|                  |                                |
| DESY     - DV -  | phone  +49 33762 7 7370        |
| Platanenallee 6  | fax    +49 33762 7 7216        |
| 15738 Zeuthen    |                                |
| Germany          |                                |
  ----------------------------------------------------