[OpenAFS] Re: Re: Reoccuring salvager errors/fixes

Thu, 10 Feb 2011 17:44:47 +0100

> This represents an inconsistency between the metadata of the file in
> /vicepX itself, and the data in the file containing information about
> vnodes. /vicepX/foo/bar says vnode 40636 has some version, but vnode
> 40636 corresponds to file /vicepX/foo/baz/quux, and the metadata in that
> file says it has some other version. It doesn't represent any actual
> corruption or anything, but could be indicative of a volume not getting
> taken offline properly, since files on disk getting updated at different
> times could cause that.

I generally don't put volumes offline expect implicitly when doing
backups. I have one backup job that creates incremental backup volumes
for each volume and another one that dumps all volumes into files.

Apart from that no special actions should be performed neither on the
volumes nor the data on /vicepX.

> This is usually caused by temporary volumes that just didn't get cleaned
> up, from an interrupted 'vos' operation, for instance. The fileserver
> can be operating perfectly fine and get shutdown properly etc etc, and
> you can still get these if someone ctrl-C'd a "vos release" at the right
> time.

Yes, I noticed the problems when cancelling vos commands already. But
for at least a month I didn't do anything like that and still these
fixes occur. I have the impression that is is related to the creation of
incremental backup volumes via "bos backupsys", cause that is the only
operation that is performed on a regular basis and that creates/removes
volumes.
 =20
> This is just data that is in the volume, but is not referenced anywhere
> by any dir (interrupted renames, unlinks, etc, can do that). iirc, this
> will keep coming up in subsequent salvages unless you run with '-orphans
> remove' or '-orphans attach'. As you might expect, 'remove' will delete
> the data, and 'attach' will create new files and dirs in the volume
> root. Since the filenames were lost, you get files like
> __ORPHANFILE__.X.Y. From there, you can examine the files, and if you
> recognize what they are, you can move them where they should go.

Thanks for pointing me to the '-orphans' option. I didn't even recognize
that the salvager didn't deal with the orphans right away.

> The first one isn't quite so bad, but the others are more surprising.
> What filesystem(s) are you using for /vicep* ? Is it possible you have
> ever rebooted the machine without shutting down the OpenAFS daemons, or
> manually killed the fileserver or anything?
>=20
> Any possibility you have something messing with the contents of /vicep*?
> (As the included README indicates, changing file metadata in there can
> screw things up; yes, even the ownership/permissions)

I had to forcibly shutdown the file server once or twice but that is
already months ago. And the file server immediatly started a salvage
after starting it up again.

Then one time I foolishly ran the 'salvager' command directly for a
volume, without taking it offline. That messed up things for that volume
pretty much.

But as I said that's all a longer time ago. Since then I can't tell of
anything that happened to volumes or the /vicepX. And still the salvager
comes up with differing number and type of these events every week. I
didn't recognize any actual problems when working with AFS, however.
It's just that it makes me feel uncomfortable when these fixes need to
be made time and again by the salvager.

The only thing I can say is that the errors come up only for volumes
that get a lot of activity during the week. Mostly home directories of
users that put a lot of data there. Of course it's to be expected that
errors increase with increased usage. Still I find that errors shouldn't
actually turn up so much as long as there's no misuse.

Best regards,

Matthias

--=20
Matthias Gerstner, Dipl.-Wirtsch.-Inf. (FH), Senior Software Engineer
e.solutions GmbH

Am Wolfsmantel 46, 91058 Erlangen, Germany

Registered Office:
Pascalstr. 5, 85057 Ingolstadt, Germany

Phone +49-8458-3332-672, mailto:Matthias.Gerstner@esolutions.de
Fax +49-8458-3332-20672

e.solutions GmbH
Managing Directors Uwe Reder, Dr. Riclef Schmidt-Clausen
Register Court Ingolstadt HRB 5221