[OpenAFS] can't get volumes online

Jeffrey Hutzelman jhutz@cmu.edu
Wed, 16 Feb 2005 19:19:59 -0500


On Friday, February 11, 2005 07:45:12 PM +0100 Stephan Wiesand 
<Stephan.Wiesand@desy.de> wrote:

> For some 25 volumes, the salvager complained about problems with the
> header structure and renamed them to "bogus.<numeric ID> and left them
> offline:
>
>   ...
>   Salvaged bogus.536883946 (536883946): 449 files, 1000045 blocks
>
> We tried dumping and restoring those to different volumes: They're still
> offline. We tried running the salvager on the new volumes again, but
>
>   STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager -part /vicepd -volumeid
>   536883946 -showlog)
>   SALVAGING VOLUME 536883946.
>   xxx.yyy.zzz (536883946) not updated (created 02/11/2005 18:35)
>   Salvaged xxx.yyy.zzz (536883946): 449 files, 1000045 blocks
>
> and the volume's still offline.
>
> Any ideas? Or do we have to assume that these volumes were corrupted to
> the point where recovery is completely impossible?


It would help if you identified the platform and AFS version you're using.
Note that quoting "STARTING AFS SALVAGER 2.4" does not help -- that version 
string has said 2.4 at least since AFS 3.1, and still says the same thing 
on the OpenAFS CVS head today.

When you say the volume is offline, I assume you are basing this on the 
output you see in 'vos listvol' or 'vos examine'.  One of the ways this can 
happen is if there is another copy of the same volume (by ID) on a 
lower-numbered partition on the same server.  Have you checked that this 
volume does not appear on /vicepa, /vicepb, or /vicepc?  Is the volume 
offline even when you restore it to a different server?

Just as an additional check, does that volume (by number) actually appear 
in the VLDB?  What output do you get from 'vos listvldb 536883946' ?


If the offline-ness survives a dump and restore to a different server, then 
it is likely based on some persistent state which is recorded in a volume 
dump.  If this is the case, you may be able to get some useful information 
by looking at a volume dump of one of these volumes.

Grab a copy of my volume dump tools from 
/afs/cs.cmu.edu/project/systems-jhutz/dumpscan.

Do a dump of one of the offline volumes, and then run

afsdump_scan -PV <dump_file>

The output contains all of the volume-level information that is recorded in 
the volume dump, none of which should be particularly sensitive.  Send a 
copy of that output (it's not very long), and perhaps someone can comment 
on what's wrong.

-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
   Sr. Research Systems Programmer
   School of Computer Science - Research Computing Facility
   Carnegie Mellon University - Pittsburgh, PA