[OpenAFS] Re: New volumes get strange IDs and are unusable

Andrew Deason adeason@sinenomine.net
Thu, 13 Oct 2011 11:44:08 -0500

On Thu, 13 Oct 2011 12:04:47 +0200
Torbjörn Moa <moa@fysik.su.se> wrote:

> OK, the nagios checks are still running, and again the problem is back.
> The max volume id is now 2267649774. Stupidly, I didn't keep a constant
> watch on it after we reset it manually. So, mainly as a test, I will
> disable the nagios checks, manually reset the maxvolid again, and then
> keep watching it. If it doesn't move then, in a couple of days or so, I
> may run the syncvldb and syncserv checks manually, one by one, server by
> server, and see what happens. Unless you have some other suggestion.

That sounds fine, but you can also determine why it's happening by
turning up the debug level in the vlserver, and possibly saving the
output from the 'vos' runs.

If your vlserver is running 1.6 or newer, you can pass '-d 1' to it, and
it'll log operations that modified the vldb to VLLog. If your vlserver
is older, you'll need to pass '-d 5' to get the relevant messages
logged, along with a bunch of other stuff. Or you can log all operations
with '-auditlog /path/to/file'. If you do that and look for a message
mentioning "GetNewVolumeId", it will tell you when the max vol id
bumped, and what machine issued the command.

If that refers to a run of "vos syncvldb", then saving the output of
"vos syncvldb -verbose" would probably indicate why the max id was

> For me the top prio is to find out what causes this. The problem is not
> really that nobody's _telling_ that they're bumping maxvolid, but rather
> that it _gets_ bumped in the first place.

Well, if something said when it was doing it, it'd be easier to find out
why :)

> Running "vos listvol" on all file servers and sorting the output, I find
> the largest volume ID existing on any server is actually 536936451 (a RW
> volume), which is consistent with what's in VLDB. So there wouldn't be a
> reason for syncvldb (or anyone else for that matter) to bump maxvolid at
> all, would there?

Correct, unless a larger-ID'd volume existed when the vldb was synced,
but for some reason does not exist now.

Andrew Deason