[OpenAFS] Re: volume offline due to too low uniquifier (and salvage cannot fix it)

Jakub Moscicki Jakub.Moscicki@cern.ch
Tue, 16 Apr 2013 19:27:33 +0000


Hello,

Sorry for late reply - at this side of the pond we are off at home by now ;=
-)

Yes, the issue seems to be the overflowing 32bit uniquifier counter which w=
e suspected and which was later confirmed by Derrick.

In any case for now I took out the "check for less then the max" to restore=
 access to data and everything seemed to work okay. Users are happy ;-)

I am not sure, but if the uniquifier is just to uniquify the potentialy reu=
sed vnode id, then the risk of collisions is really low even without this c=
heck, right? One would have to have the uniquifier wrapped around 32bits in=
 between the reuse of vnode id and hit exactly the same uniquifier number. =
 It depends on the details of course of vnode id reuse algorithm but looks =
like very low probability.

Many thanks for help to all!

kuba

--


On Apr 16, 2013, at 8:07 PM, Andrew Deason <adeason@sinenomine.net>
 wrote:

> On Tue, 16 Apr 2013 13:34:18 -0400
> Derrick Brashear <shadow@gmail.com> wrote:
>=20
>> The problem he's having (apparently I replied without replying all) is
>> he's wrapping uniquifiers, and currently the volume package deals
>> poorly, since ~1300 from maxuint plus 2000 plus 1 results in a number
>> "less than the max uniquifier"
>=20
> Okay; yeah, makes sense.
>=20
>> We need to decide whether OpenAFS should
>> 1) compact the uniquifier space via the salvager (re-uniquifying all
>> outstanding vnodes save 1.1, presumably).
>> or
>> 2) simply allow the uniquifier to wrap, removing the check for "less
>> than the max", but ensuring we skip 0 and 1. there will be no direct
>> collisions as no vnode can exist twice
>>=20
>> either way, there is a slight chance a vnode,unique tuple which
>> previously existed may exist again.
>=20
> Yes, but that is inevitable unless we keep trackof uniqs per-vnode or
> something. If we do option (1), I feel like that makes the possible
> collisions more infrequent in a way, since the event triggering the
> collisions is a salvage, which has 'undefined' new contents for caching
> purposes anyway. In option (2) you can have a collision by just removing
> a file and creating one. Maybe those aren't _so_ different, but that's
> my impression.
>=20
> I feel like the fileserver could also maybe not increment the uniq
> counter so much, if we issue a lot of create's/mkdir's with no other
> interleaving operations. That is, if we create 3 files in a row, it's
> fine if they were given fids 1.2.9, 1.4.9, and 1.6.9, right? We would
> guarantee that we wouldn't collide on the whole fid (at least, no more
> so than now), just on the uniq, which is okay, right? That might help
> avoid this in some scenarios.
>=20
> And for kuba's sake, I guess the immediate workaround to get the volume
> online could be to just remove that check for the uniq. I would use that
> to just get the data online long enough to copy the data to another
> volume, effectively re-uniq-ifying them. I think I'd be uneasy with just
> not having the check in general, but I'd need to think about it...
>=20
> --=20
> Andrew Deason
> adeason@sinenomine.net
>=20
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info