[OpenAFS] Re: volume offline due to too low uniquifier (and salvage cannot fix it)

Andrew Deason adeason@sinenomine.net
Tue, 16 Apr 2013 13:07:24 -0500

On Tue, 16 Apr 2013 13:34:18 -0400
Derrick Brashear <shadow@gmail.com> wrote:

> The problem he's having (apparently I replied without replying all) is
> he's wrapping uniquifiers, and currently the volume package deals
> poorly, since ~1300 from maxuint plus 2000 plus 1 results in a number
> "less than the max uniquifier"

Okay; yeah, makes sense.

> We need to decide whether OpenAFS should
> 1) compact the uniquifier space via the salvager (re-uniquifying all
> outstanding vnodes save 1.1, presumably).
> or
> 2) simply allow the uniquifier to wrap, removing the check for "less
> than the max", but ensuring we skip 0 and 1. there will be no direct
> collisions as no vnode can exist twice
> either way, there is a slight chance a vnode,unique tuple which
> previously existed may exist again.

Yes, but that is inevitable unless we keep trackof uniqs per-vnode or
something. If we do option (1), I feel like that makes the possible
collisions more infrequent in a way, since the event triggering the
collisions is a salvage, which has 'undefined' new contents for caching
purposes anyway. In option (2) you can have a collision by just removing
a file and creating one. Maybe those aren't _so_ different, but that's
my impression.

I feel like the fileserver could also maybe not increment the uniq
counter so much, if we issue a lot of create's/mkdir's with no other
interleaving operations. That is, if we create 3 files in a row, it's
fine if they were given fids 1.2.9, 1.4.9, and 1.6.9, right? We would
guarantee that we wouldn't collide on the whole fid (at least, no more
so than now), just on the uniq, which is okay, right? That might help
avoid this in some scenarios.

And for kuba's sake, I guess the immediate workaround to get the volume
online could be to just remove that check for the uniq. I would use that
to just get the data online long enough to copy the data to another
volume, effectively re-uniq-ifying them. I think I'd be uneasy with just
not having the check in general, but I'd need to think about it...

Andrew Deason