[OpenAFS] Re: Reoccuring salvager errors/fixes

Andrew Deason adeason@sinenomine.net
Thu, 10 Feb 2011 00:18:09 -0600

On Wed, 9 Feb 2011 13:49:24 +0100
Matthias Gerstner <matthias.gerstner@esolutions.de> wrote:

> 02/06/2011 05:17:44 Vnode 40636: version < inode version; fixed (old status)

This represents an inconsistency between the metadata of the file in
/vicepX itself, and the data in the file containing information about
vnodes. /vicepX/foo/bar says vnode 40636 has some version, but vnode
40636 corresponds to file /vicepX/foo/baz/quux, and the metadata in that
file says it has some other version. It doesn't represent any actual
corruption or anything, but could be indicative of a volume not getting
taken offline properly, since files on disk getting updated at different
times could cause that.

> And some of these:
> 02/06/2011 05:54:25 The volume header file /vicepa/V0536874151.vol is not associated with any actual data (deleted)

This is usually caused by temporary volumes that just didn't get cleaned
up, from an interrupted 'vos' operation, for instance. The fileserver
can be operating perfectly fine and get shutdown properly etc etc, and
you can still get these if someone ctrl-C'd a "vos release" at the right

> 02/06/2011 05:23:45 Directory bad, vnode 19595; salvaging...
> 02/06/2011 05:23:45 Salvaging directory 19595...
> 02/06/2011 05:23:45 Checking the results of the directory salvage...
> 02/06/2011 05:25:37 Vnode 19593: link count incorrect (was 2, now 3)

This is actual data that was wrong and was fixed. It's pretty much just
what it says; the link count was wrong.

> 02/06/2011 05:25:37 Found 12918 orphaned files and directories (approx. 512678 KB)

This is just data that is in the volume, but is not referenced anywhere
by any dir (interrupted renames, unlinks, etc, can do that). iirc, this
will keep coming up in subsequent salvages unless you run with '-orphans
remove' or '-orphans attach'. As you might expect, 'remove' will delete
the data, and 'attach' will create new files and dirs in the volume
root. Since the filenames were lost, you get files like
__ORPHANFILE__.X.Y. From there, you can examine the files, and if you
recognize what they are, you can move them where they should go.

> The question I have is whether it can be considered normal that I get
> such inconsistencies on a regular basis. I would think that getting
> inconsistencies should be the exceptional case.

The first one isn't quite so bad, but the others are more surprising.
What filesystem(s) are you using for /vicep* ? Is it possible you have
ever rebooted the machine without shutting down the OpenAFS daemons, or
manually killed the fileserver or anything?

Any possibility you have something messing with the contents of /vicep*?
(As the included README indicates, changing file metadata in there can
screw things up; yes, even the ownership/permissions)

Andrew Deason