[OpenAFS] servers not establishing a quorum

lists@drewstud.com lists@drewstud.com
Tue, 6 Apr 2010 09:11:35 -0400 (EDT)

We had two afs servers and things were running great, we had a nice quorum =
and all was happy.=0AWe added an addition afs server over the weekend, and =
now none of the machines will establish a quorum. All FileLogs show the 537=
6 error code.=0ATue Apr  6 08:59:59 2010 File server starting=0ATue Apr  6 =
08:59:59 2010 /var/openafs/sysid: doesn't exist=0ATue Apr  6 08:59:59 2010 =
Creating new SysID file=0ATue Apr  6 08:59:59 2010 VL_RegisterAddrs rpc fai=
led; will retry periodically (code=3D5376, err=3D0)=0ATue Apr  6 09:00:00 2=
010 Set thread id 133 for FSYNC_sync=0ATue Apr  6 09:00:00 2010 FSYNC_sync:=
 bind failed with (98), removed bogus /var/openafs/fssync.sock=0A=0Audebug =
of 7002 of all three servers:=0Ahttp://pastebin.com/SZyM4BC7=0A=0A=0AThey a=
ll show the sync host as (which is what it gets set to when a qurou=
m cannot be established right?)=0A=0Avos listaddrs shows the two original a=
fs servers, but not the current one.=0A=0AI upped the debug level on the vl=
server and get:=0ATue Apr  6 09:09:44 2010 beacon: amSyncSite is 0=0ATue Ap=
r  6 09:09:44 2010 Received beacon type 0 from host Apr =
 6 09:09:46 2010 Received beacon from unknown host Apr  6=
 09:09:48 2010 recovery running in state 0=0ATue Apr  6 09:09:48 2010 beaco=
n: amSyncSite is 0=0ATue Apr  6 09:09:52 2010 recovery running in state 0=
=0ATue Apr  6 09:09:52 2010 beacon: amSyncSite is 0=0A=0Arepeatedly. =0AWe =
added the server to the CellSrvDB file on all afs servers, and restarted th=
em, and we got this. Also the sysid file is not being created on the new se=
rver (which iirc is because no quorum can be established). =0AI checked tim=
e, and they are all sycned within ~1 second of each other. =0A=0AWhat else =
could I be missing or need to check? I am sure it is something very simple.=
=0A=0AThank you.=0A=0A=0A