[OpenAFS] Weird Quorum Issues

Hartmut Reuter reuter@rzg.mpg.de
Thu, 06 Nov 2003 16:31:31 +0100


Aaron Stanley wrote:
> Some additional information for your consideration now that I'm back at the
> office:
> 
> Output of udebug <server> 7000
> Return code -1 from VOTE_Debug
> 
> Errors in FileLog:
> VL_RegisterAddrs rpc failed; will retry periodically (code=5376, err=4)
> 
> The above error showed up on all my servers but has now stopped (last
> reported error that I can see was ~3am this morning).  I still, however, get
> the on/off quorum.  I was able to unlock a volume this morning, but can't
> backup or release because it times out during the operation.
> 
> What does the FileLog entry mean?


Fileservers register their uuid and ip-addresses in the vldb server at 
start time. The then client gets the actual ip-address of a fileserver 
he wants to contact from the vldb.

The registration requires a write into the database and can be performed 
only on the sync-site. If for some reason you have problems with your 
sync site this error message appears in the FileLog.

To your primary problem:

How many database server are you running and with which ip-addresses?
If you do a "bos listhosts" get you the same information from all of them?
Have the database processes been restarted after the last change to the 
host list?

-Hartmut


> 
>  - AB
> 
> 


-- 
-----------------------------------------------------------------
Hartmut Reuter                           e-mail reuter@rzg.mpg.de
					   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)               fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------