[OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

Giovanni Bracco bracco@frascati.enea.it
Mon, 23 Mar 2009 09:16:41 +0100


On Sunday 22 March 2009 20:39, Jeffrey Altman wrote:
> Giovanni Bracco wrote:
> > As I wrote in my posting, at that time (2002) my institution was using
> > the Transarc version of AFS and the reaction from  Transarc team was
> > ...to provide us with a patched version of AFS, not to correct the issue.
> > That version of course was not compatible with OpenAFS due to the large
> > value of the VolIDs existing at that point in our cell.
>
> The patched version of AFS fixed the issue.  The issue is that in some
> locations in the source a Volume Id is an unsigned 32-bit value and in
> others (most notably clone ids) the value is a signed 32-bit value.
> If a signed value is increased beyond 2^31-1 it will wrap and become a
> negative value.  There is no condition under which a negative value will
> be greater than Max Volume Id.
>
> I'm sure that the fix that IBM implemented for you in 2002 was to change
> all of the Volume Id fields so that they are unsigned 32-bit values.
> IBM does not provide their internal bug reports and patches to OpenAFS
> so we never knew about the issue.
>
> > To perform the migration to OpenAFS 3 years later we had to go through a
> > volume renumbering campaign (more than 1000 volumes) plus an ad-hoc
> > modification of the vl database to reset the MAxVolID to a value
> > supported by OpenAFS. At that point do you think we should have submitted
> > a bug on misterious event happened three years before on the Transarc AFS
> > version?
>
> You had to do this because OpenAFS did not have the patch that IBM
> created and we didn't know that we needed to implement it ourselves.
>
> > From the  follow-up of the thread (postings by Hartmut Reuter and  Rainer
> > Toebbicke )  I see that the "strange" big jump in the VolID still happens
> > and surely the issue should be solved.
>
> There are several locations where unsigned and signed 32-bit variables
> containing volume ids are mixed either for comparison or computation.
> The computation of the new maxvolid value is one such place where this
> takes place.  It is quite likely that the mixture of signed and unsigned
> values resulted in signed 32-bit overflow which in turn resulted in an
> incorrect comparison and then assignment.  This in turn would result in
> the big jump.
>
> I have a patch attached to ticket 124510 which will (I hope) make all
> references to volume ids unsigned (except in the cache manager) and
> avoid the problems with unsigned overflow conditions.
>
> I suspect this patch is similar to what IBM applied to their source
> tree in 2002.
>
> Jeffrey Altman

OK, it is nice to know that hopefully the problem will be solved in the next 
OpenAFS release!

Giovanni


-- 
Giovanni Bracco
ENEA FIM
(Servizio Informatica e Reti)
Via E. Fermi 45
I-00044 Frascati (Roma) Italy
phone 00-39-06-9400-5597
FAX   00-39-06-9400-5735
E-mail  bracco@frascati.enea.it
WWW http://www.afs.enea.it/bracco