[OpenAFS] HELP! We 've lost our sync site

Mosley, Mike jmmosley@uncc.edu
Sat, 10 Jan 2004 15:01:50 -0500


I found the notes and I understand now.  Thanks to everybody who responded
for your help and for responding so quickly.  I'll be looking for the new
binaries.

Thanks again.

Mike

-----Original Message-----
From: Douglas E. Engert [mailto:deengert@anl.gov] 
Sent: Saturday, January 10, 2004 2:40 PM
To: Mosley, Mike
Cc: 'openafs-info@openafs.org'
Subject: Re: [OpenAFS] HELP! We 've lost our sync site



"Mosley, Mike" wrote:
> 
> I have not seen the notes about the Ubik time overflow problem. 

Sorry that was on openafs-dev. 

> I'm
> currently runnint 1.2.10 under Solaris 9.  I can drop back to 1.2.8. 

As Russ indicated, this looks like a problem in the original Transarc
code too. So dropping back won't help. 

 
> Is
> there anyway I can correct the problem for the short term to correct the
> problem while I prepare to back down to the earlier version?

Install new server binaries in /usr/afs/bin. You will need to wait
for the OpenAFS releaseod 1.2.11 (which might be any time now)
or do it yourself, or fine someone with a sun4x_59 build. 

> 
> Thanks,
> 
> Mike
> 
> -----Original Message-----
> From: Douglas E. Engert [mailto:deengert@anl.gov]
> Sent: Saturday, January 10, 2004 2:16 PM
> To: James M Mosley
> Cc: openafs-info@openafs.org
> Subject: Re: [OpenAFS] HELP! We 've lost our sync site
> 
> Did you see the four notes on "Ubik time overflow at 0x40000000"
> That is the problem.
> 
> What version of AFS are you running and on wht OS?
> If you can't build it yourself, or can't want for the OPenAFS peple
> to build a release, maybe some else might have built it by now.
> (I have OpenAFS-1.2.10 running on sunx4_58 for the last 15 minutes.)
> 
> James M Mosley wrote:
> >
> > All,
> >         We need immediate help!  We have been unable to establish a sync
> > site for about 6 hours.  All 3 of our database servers are up and appear
> > to be perfroming the election as expected.  However, the server that
> > should be come the synce site doesn't.    Here is some output from
udebug
> > on that server:
> >
> > as-sm1# udebug as-sm1 7002 -long
> > Host's addresses are: 152.15.10.70
> > Host's 152.15.10.70 time is Sat Jan 10 13:59:36 2004
> > Local time is Sat Jan 10 13:59:39 2004 (time differential 3 secs)
> > Last yes vote for 152.15.10.70 was 3 secs ago (not sync site);
> > Last vote started 3 secs ago (at Sat Jan 10 13:59:36 2004)
> > Local db version is 1073480540.254
> > I am not sync site
> > Lowest host 152.15.10.70 was set 3 secs ago
> > Sync host 0.0.0.0 was set 1073761176 secs ago
> > Sync site's db version is 1073480540.254
> > 0 locked pages, 0 of them for write
> >
> > Server (152.15.13.7): (db 0.0)
> >     last vote rcvd 5 secs ago (at Sat Jan 10 13:59:34 2004),
> >     last beacon sent 3 secs ago (at Sat Jan 10 13:59:36 2004), last vote
> was yes
> >     dbcurrent=0, up=1 beaconSince=1
> >
> > Server (152.15.30.27): (db 0.0)
> >     last vote rcvd 4 secs ago (at Sat Jan 10 13:59:35 2004),
> >     last beacon sent 3 secs ago (at Sat Jan 10 13:59:36 2004), last vote
> was yes
> >     dbcurrent=0, up=1 beaconSince=1
> > as-sm1#
> >
> > The only strange thing we have noticed is that when we attempted to
> > stop/restart the database servers to see if the condition we clear
itself
> > up we saw as-sm1 become the sync site (as it should) but it claimed it
was
> > a sync site for a negative number of seconds.  The amount of time seemed
> > to refer back to about the time we started seeing the problem as
evidenced
> > by the last time the local database files were updated.
> >
> > All three database servers our running Solaris 9 and OpenAFS 1.2.10.
> >
> > We need help soon.  Thanks.
> >
> > Mike
> >
> > -------------------------------------
> > Mike Mosley                             Email: jmmosley@uncc.edu
> > Systems Software Developer              Phone: (704) 687-3522
> > College of Engineering, UNC-Charlotte   Fax: (704) 687-2352
> > _______________________________________________
> > OpenAFS-info mailing list
> > OpenAFS-info@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-info
> 
> --
> 
>  Douglas E. Engert  <DEEngert@anl.gov>
>  Argonne National Laboratory
>  9700 South Cass Avenue
>  Argonne, Illinois  60439
>  (630) 252-5444
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info

-- 

 Douglas E. Engert  <DEEngert@anl.gov>
 Argonne National Laboratory
 9700 South Cass Avenue
 Argonne, Illinois  60439 
 (630) 252-5444