[OpenAFS-devel] diagnosing my problem with ubik elections... bug in ubik

Neulinger, Nathan nneul@umr.edu
Tue, 3 Apr 2001 13:10:32 -0500


Yep, my guess is, this would affect anyone who's database servers all had:

	high last component on little-endian
	high first component on big-endian

since peoples memory addresses for the ubik_host array are not likely to be
in the high end of the address range.

-- Nathan

> -----Original Message-----
> From: Derek Atkins [mailto:warlord@MIT.EDU]
> Sent: Tuesday, April 03, 2001 12:42 PM
> To: Neulinger, Nathan
> Cc: 'Ken Hornstein'; 'openafs-devel@openafs.org'
> Subject: Re: [OpenAFS-devel] diagnosing my problem with ubik
> elections... bug in ubik
> 
> 
> Other people might have IP Addresses that are lower than their memory
> addresses :)
> 
> -derek
> 
> "Neulinger, Nathan" <nneul@umr.edu> writes:
> 
> > Well, turns out, that change for lowestHost DID fix my 
> entire problem...
> > 
> > Unfortunately, when I added debugging I changed a 
> > 
> > if (x)
> > 	return 0;
> > 
> > to a 
> > 
> > if (x)
> > 	debug
> > 	return 0;
> > 
> > (I hate the fact that openafs code is unmaintainable with 
> tabstop=4. That's
> > what got me messed up there.)
> > 
> > as soon as I fixed that, it started working after the 75 
> (BIGTIME) second
> > delay.
> > 
> > What I'd like to know is - how come no one else has been 
> impacted by that
> > lowestHost thing? what is different about my setup that 
> it's affecting them
> > - is no one else running 3 database servers on linux boxes perhaps?
> > 
> > I'm going to back out all my debugging changes and come up 
> with a minimal
> > set of changes to verify that this indeed corrects the problem. 
> > 
> > -- Nathan
> > 
> > > -----Original Message-----
> > > From: Neulinger, Nathan 
> > > Sent: Tuesday, April 03, 2001 12:08 PM
> > > To: 'Ken Hornstein'
> > > Cc: 'openafs-devel@openafs.org'
> > > Subject: RE: [OpenAFS-devel] diagnosing my problem with ubik
> > > elections... bug in ubik 
> > > 
> > > 
> > > Yeah, I've been waiting long enough... learned that much 
> > > about the protocol already, head about to explode from it too...
> > > 
> > > I've let it sit overnight in a couple cases, it's just 
> > > looping forever. I've about got it tracked down, has taken me 
> > > a while to get enough debugging added to ubik stuff to where 
> > > I can understand exactly how it works.
> > > 
> > > -- Nathan
> > > 
> > > > -----Original Message-----
> > > > From: Ken Hornstein [mailto:kenh@cmf.nrl.navy.mil]
> > > > Sent: Tuesday, April 03, 2001 12:03 PM
> > > > To: Neulinger, Nathan
> > > > Cc: 'openafs-devel@openafs.org'
> > > > Subject: Re: [OpenAFS-devel] diagnosing my problem with ubik
> > > > elections... bug in ubik 
> > > > 
> > > > 
> > > > >Once I changed that, the lowestHost calculation is looking 
> > > > much better.
> > > > >Still not syncing up cause no one is ever sending a yes 
> > > > vote, but I'm still
> > > > >looking at that. 
> > > > 
> > > > Just FYI: as part of the protocol, no one can send a "yes" 
> > > > vote for BIG
> > > > seconds after startup (I think "BIG" is something like 90, 
> > > but I don't
> > > > remember).  If you're restarting it before that timer 
> elapses, then
> > > > that might be part of the problem.
> > > > 
> > > > I have a document which describes the basic Ubik 
> protocol which IMHO
> > > > is essential for debugging these sorts of things; Derrick, 
> > > > maybe it should
> > > > be added to the base distribution?  (If it isn't already).
> > > > 
> > > > --Ken
> > > > 
> > > 
> > _______________________________________________
> > OpenAFS-devel mailing list
> > OpenAFS-devel@openafs.org
> > https://lists.openafs.org/mailman/listinfo.cgi/openafs-devel
> 
> -- 
>        Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
>        Member, MIT Student Information Processing Board  (SIPB)
>        URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
>        warlord@MIT.EDU                        PGP key available
>