[OpenAFS] please help no sync site problem
Otto-Michael BRAUN
omb@computer.org
Sun, 11 Jan 2004 19:30:04 +0100
Hi,
I am running an OpenAFS 1.2.10 test site with two db servers under RedHat
8.0, site is up since November 2002.
Today, since this morning no administration is possible, error message (Win
client) is "error, no quorum elected (0x00001500)" when I try to create a
volume, replica or release.
I read in the archives and found: the time sync seens to be ok, all volumes
are mounted, servers are in hosts file and DNS (no changes made), the
FileLog says:
Sun Jan 11 16:51:35 2004 File server starting
Sun Jan 11 16:51:35 2004 afs_krb_get_lrealm failed, using ombbln.de.
Sun Jan 11 16:52:34 2004 VL_RegisterAddrs rpc failed; will retry
periodically (code=5376, err=0)
Sun Jan 11 16:52:34 2004 Set thread id 14 for FSYNC_sync
Sun Jan 11 16:52:34 2004 Partition /vicepe: attached 1 volumes; 0 volumes
not attached
Sun Jan 11 16:52:52 2004 Partition /vicepa: attached 401 volumes; 0 volumes
not attached
Sun Jan 11 16:53:00 2004 Partition /vicepb: attached 133 volumes; 0 volumes
not attached
Sun Jan 11 16:53:04 2004 Partition /vicepc: attached 49 volumes; 0 volumes
not attached
Sun Jan 11 16:53:07 2004 Partition /vicepd: attached 33 volumes; 0 volumes
not attached
Sun Jan 11 16:53:07 2004 Set thread id 15 for 'FiveMinuteCheckLWP'
Sun Jan 11 16:53:07 2004 Set thread id 16 for 'HostCheckLWP'
Sun Jan 11 16:53:07 2004 Getting FileServer name...
Sun Jan 11 16:53:07 2004 FileServer host name is 'afs1'
Sun Jan 11 16:53:07 2004 Getting FileServer address...
Sun Jan 11 16:53:07 2004 FileServer afs1 has address 192.168.9.7 (0x709a8c0
or 0xc0a80907 in host byte order)
Sun Jan 11 16:53:07 2004 File Server started Sun Jan 11 16:53:07 2004
Sun Jan 11 16:58:07 2004 VL_RegisterAddrs rpc failed; will retry
periodically (code=5376, err=0)
last message continuously repeated ...
I made a udebug on both servers and got:
[root@afs1 root]# udebug afs1 7003 -long
Host's addresses are: 192.168.9.7
Host's 192.168.9.7 time is Sun Jan 11 19:01:35 2004
Local time is Sun Jan 11 19:01:36 2004 (time differential 1 secs)
Last yes vote for 192.168.9.7 was 0 secs ago (not sync site);
Last vote started 0 secs ago (at Sun Jan 11 19:01:36 2004)
Local db version is 1073185535.95
I am not sync site
Lowest host 192.168.9.7 was set 0 secs ago
Sync host 0.0.0.0 was set 1073844095 secs ago
Sync site's db version is 1073185535.95
0 locked pages, 0 of them for write
Server (192.168.9.8): (db 0.0)
last vote rcvd 1 secs ago (at Sun Jan 11 19:01:35 2004),
last beacon sent 0 secs ago (at Sun Jan 11 19:01:36 2004), last vote
was yes
dbcurrent=0, up=1 beaconSince=1
[root@afs1 root]# udebug afs2 7003 -long
Host's addresses are: 192.168.9.8
Host's 192.168.9.8 time is Sun Jan 11 19:02:42 2004
Local time is Sun Jan 11 19:02:45 2004 (time differential 3 secs)
Last yes vote for 192.168.9.7 was 8 secs ago (not sync site);
Last vote started 7 secs ago (at Sun Jan 11 19:02:38 2004)
Local db version is 1073185535.95
I am not sync site
Lowest host 192.168.9.7 was set 8 secs ago
Sync host 0.0.0.0 was set 1073844162 secs ago
Sync site's db version is 1073185535.95
0 locked pages, 0 of them for write
Server (192.168.9.7): (db 0.0)
last vote rcvd 6263 secs ago (at Sun Jan 11 17:18:22 2004),
last beacon sent 6263 secs ago (at Sun Jan 11 17:18:22 2004), last
vote was no
dbcurrent=0, up=1 beaconSince=1
The problem seems to be that none of the two servers is the sync site, but
address 0.0.0.0 (which is really the lowest possible ip-address) is beeing
held to be the sync site.
Any help appreciated!
Michael Braun