[OpenAFS] HELP! We 've lost our sync site

Mosley, Mike jmmosley@uncc.edu
Sat, 10 Jan 2004 14:23:37 -0500


I have not seen the notes about the Ubik time overflow problem.  I'm
currently runnint 1.2.10 under Solaris 9.  I can drop back to 1.2.8.  Is
there anyway I can correct the problem for the short term to correct the
problem while I prepare to back down to the earlier version?

Thanks,

Mike

-----Original Message-----
From: Douglas E. Engert [mailto:deengert@anl.gov] 
Sent: Saturday, January 10, 2004 2:16 PM
To: James M Mosley
Cc: openafs-info@openafs.org
Subject: Re: [OpenAFS] HELP! We 've lost our sync site

Did you see the four notes on "Ubik time overflow at 0x40000000"
That is the problem. 

What version of AFS are you running and on wht OS?
If you can't build it yourself, or can't want for the OPenAFS peple
to build a release, maybe some else might have built it by now.
(I have OpenAFS-1.2.10 running on sunx4_58 for the last 15 minutes.)

James M Mosley wrote:
> 
> All,
>         We need immediate help!  We have been unable to establish a sync
> site for about 6 hours.  All 3 of our database servers are up and appear
> to be perfroming the election as expected.  However, the server that
> should be come the synce site doesn't.    Here is some output from udebug
> on that server:
> 
> as-sm1# udebug as-sm1 7002 -long
> Host's addresses are: 152.15.10.70
> Host's 152.15.10.70 time is Sat Jan 10 13:59:36 2004
> Local time is Sat Jan 10 13:59:39 2004 (time differential 3 secs)
> Last yes vote for 152.15.10.70 was 3 secs ago (not sync site);
> Last vote started 3 secs ago (at Sat Jan 10 13:59:36 2004)
> Local db version is 1073480540.254
> I am not sync site
> Lowest host 152.15.10.70 was set 3 secs ago
> Sync host 0.0.0.0 was set 1073761176 secs ago
> Sync site's db version is 1073480540.254
> 0 locked pages, 0 of them for write
> 
> Server (152.15.13.7): (db 0.0)
>     last vote rcvd 5 secs ago (at Sat Jan 10 13:59:34 2004),
>     last beacon sent 3 secs ago (at Sat Jan 10 13:59:36 2004), last vote
was yes
>     dbcurrent=0, up=1 beaconSince=1
> 
> Server (152.15.30.27): (db 0.0)
>     last vote rcvd 4 secs ago (at Sat Jan 10 13:59:35 2004),
>     last beacon sent 3 secs ago (at Sat Jan 10 13:59:36 2004), last vote
was yes
>     dbcurrent=0, up=1 beaconSince=1
> as-sm1#
> 
> The only strange thing we have noticed is that when we attempted to
> stop/restart the database servers to see if the condition we clear itself
> up we saw as-sm1 become the sync site (as it should) but it claimed it was
> a sync site for a negative number of seconds.  The amount of time seemed
> to refer back to about the time we started seeing the problem as evidenced
> by the last time the local database files were updated.
> 
> All three database servers our running Solaris 9 and OpenAFS 1.2.10.
> 
> We need help soon.  Thanks.
> 
> Mike
> 
> -------------------------------------
> Mike Mosley                             Email: jmmosley@uncc.edu
> Systems Software Developer              Phone: (704) 687-3522
> College of Engineering, UNC-Charlotte   Fax: (704) 687-2352
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info

-- 

 Douglas E. Engert  <DEEngert@anl.gov>
 Argonne National Laboratory
 9700 South Cass Avenue
 Argonne, Illinois  60439 
 (630) 252-5444