[OpenAFS] kasserver problems: no quorum elected...

Andreas Donath Andreas.Donath@aei.mpg.de
Mon, 12 Jan 2004 16:22:45 +0100


Hi all,

I've got a problem with our kas service.
At the moment I'm unable to set passwords with "kas setpasswd".
The error message I get is:

kas:setpasswd: no quorum elected so can't set password for user

Authentication still works fine.

As far as I understand from documentation and reading the mailing list,
the problem can be caused by wrong CellServDB files and wrong
times on the different servers.

I already checked this and could not find any anomalies.
The Config files did not get changed - it just stopped working all of a sudden..

Here is my output from udebug on port 7004
Also mention : the servers do not run on open-afs but on
IBM-AFS for alpha Tru64 (Hope I'm allowed to ask questions
here at all). 
-----------
Server1:
Host's addresses are: 194.94.224.128
Host's 194.94.224.128 time is Mon Jan 12 15:39:52 2004
Local time is Mon Jan 12 15:39:52 2004 (time differential 0 secs)
Last yes vote for 194.94.224.103 was 12 secs ago (not sync site);
Last vote started 11 secs ago (at Mon Jan 12 15:39:41 2004)
Local db version is 1073483981.8
I am not sync site
Lowest host 194.94.224.103 was set 12 secs ago
Sync host 0.0.0.0 was set 1073918392 secs ago
Sync site's db version is 1073483981.8
0 locked pages, 0 of them for write
--------
Server2:
Host's addresses are: 194.94.224.103 194.94.224.107
Host's 194.94.224.103 time is Mon Jan 12 15:45:38 2004
Local time is Mon Jan 12 15:45:39 2004 (time differential 1 secs)
Last yes vote for 194.94.224.103 was 4 secs ago (not sync site);
Last vote started 4 secs ago (at Mon Jan 12 15:45:35 2004)
Local db version is 1073483981.8
I am not sync site
Lowest host 194.94.224.103 was set 4 secs ago
Sync host 0.0.0.0 was set 1073918738 secs ago
Sync site's db version is 1073483981.8
0 locked pages, 0 of them for write
----------- 
Server3:
Host's addresses are: 194.94.224.104 194.94.224.106
Host's 194.94.224.104 time is Mon Jan 12 15:46:18 2004
Local time is Mon Jan 12 15:46:19 2004 (time differential 1 secs)
Last yes vote for 194.94.224.103 was 11 secs ago (not sync site);
Last vote started 11 secs ago (at Mon Jan 12 15:46:08 2004)
Local db version is 1073483981.8
I am not sync site
Lowest host 194.94.224.103 was set 11 secs ago
Sync host 0.0.0.0 was set 1073918778 secs ago
Sync site's db version is 1073483981.8
0 locked pages, 0 of them for write

So as seen, no server announces itself as sync site,
time differential seems reasonable...

What makes me thing is:
->Sync host 0.0.0.0 was set 1073918778 secs ago

What is that supposed to mean ?
How can that be changed?

AuthLog says:
-----------------
Server1:
Waiting for quorum election.
Waiting for quorum election.
Waiting for qu
-------------------
Server2:
Waiting for quorum election.
Waiting for quorum election.
Waiting for qu
-------------------
Server3:
Waiting for quorum election.
Waiting for quorum election.
Waiting for qu

No kidding, the line stops after "qu".

I have had those quorum troubles in the past (very seldom), but either
after a little while (10mins or so)  they vanished (time sync I guess)
or a restart of the kaserver instance did the job. None of these things
helped so far this time.

Any help is very much appreciated.

Andreas