[OpenAFS] Re: New volumes get strange IDs and are unusable

Andrew Deason adeason@sinenomine.net
Tue, 11 Oct 2011 11:32:10 -0500


On Mon, 19 Sep 2011 20:22:17 +0200
Torbjörn Moa <moa@fysik.su.se> wrote:

> >> : No such device
> >> Volume does not exist on server sysafs2.physto.se as indicated by the VLDB
> > 
> > What version? Some things used to have problems with volume IDs over
> > 2147483648 but I thought we've fixed them all by now.
> 
> On this particular node we run 1.4.6, but it varies between servers.

I lied, this bug still exists. At least, it does for me on a 32-bit x86
host. What platform was this? Through a quirk of atol/atoi it doesn't
seem to be a problem on amd64 for me, which is probably why I thought it
wasn't a problem. (gerrit 5594, bug 130266)

> > Something bumped the "max volume id" counter in the vldb by a large
> > number. This could happen in many different ways... unfortuntely, if
> > you don't have the logging level turned up in the vlserver or have
> > audit logs turned on, it's going to be difficult to determine what
> > did it. Do you run any kind of periodic checking for consistency of
> > volumes vs vldb or anything like that?
> 
> Hmmm, yes we do. We have a nagios check running on all servers that
> does a "vos syncserv "$server" -d" and "vos syncvldb "$server"
> -dryrun" periodically. I guess you are implying I shouldn't do that...

No, I don't mean to say that, but it's a possible cause. The -dryrun
option to these does not currently prevent "vos" from raising the max
volume id in the database. That's a bug, but it's what they currently
do. It doesn't even print out anything when it does this, so you
wouldn't know when it happened. (bug 130267)

-- 
Andrew Deason
adeason@sinenomine.net