[OpenAFS] Weird Quorum Issues

Aaron Stanley astanley@strozllc.com
Thu, 06 Nov 2003 10:00:43 -0500


Some additional information for your consideration now that I'm back at the
office:

Output of udebug <server> 7000
Return code -1 from VOTE_Debug

Errors in FileLog:
VL_RegisterAddrs rpc failed; will retry periodically (code=5376, err=4)

The above error showed up on all my servers but has now stopped (last
reported error that I can see was ~3am this morning).  I still, however, get
the on/off quorum.  I was able to unlock a volume this morning, but can't
backup or release because it times out during the operation.

What does the FileLog entry mean?

 - AB


-- 
Aaron Stanley
Director, Information Technology
Stroz Friedberg, LLC
15 Maiden Lane, 12th Floor
New York, NY  10038
212/981.6534[o] | 917/859.1503[c] | 815/642.0223[f]


***********************************************************************

This message is for the named person's use only.  It may contain
confidential, proprietary or legally privileged information. No right to
confidential or privileged treatment of this message is waived or lost
by any error in transmission.  If you have received this message in
error, please immediately notify the sender by e-mail or by telephone at
212 981 6540, delete the message and all copies from your system and
destroy any hard copies.  You must not, directly or indirectly, use,
disclose, distribute, print or copy any part of this message if you are
not the intended recipient.

************************************************************************

> From: Aaron Stanley <astanley@strozllc.com>
> Date: Wed, 05 Nov 2003 21:50:09 -0500
> To: <openafs-info@openafs.org>
> Subject: [OpenAFS] Weird Quorum Issues
> 
> 
> I was away from my cluster for two days and when I got back I noticed some
> very odd behavior with volumes locking for replication and not getting
> unlocked like normal.  When I would issue a vos unlock for the volumes that
> were locked I got strange errors like:
> 
> u: No Quorum Elected
> Error in vos unlock command
> 
> If I continued to re-run the command, eventually the volume would unlock.  I
> noticed this behavior with vos create, vos release, and vos backup as well.
> 
> I ran a udebug on all three of my vl servers for both ports 7002 and 7004.
> On the primary (largest) fileserver, the output would waffle between the
> normal "I am sync site" and the not normal "I am not sync site".  I thought
> at first that it might be a network issue, but pings between servers was
> great and bandwith was not an issue.
> 
> At this point, I don't know what could be causing the primary server to,
> sort of, lose its quorum every second or so, so I'm hoping someone on the
> list might be able to point in a direction.  I didn't notice anything
> strange in the logs, but perhaps I'm looking in the wrong place so any hints
> there would be appreciated.
> 
> I tried culling the list archives for a similar problem report, but I
> couldn't find anybody describing a situation where sometimes the quorum was
> present, but then a second or two later it was gone.
> 
> I truly appreciate any ideas and/or help.  Thanks!
> 
> - AB
> 
> 
> -- 
> Aaron Stanley
> Director, Information Technology
> Stroz Friedberg, LLC
> 15 Maiden Lane, 12th Floor
> New York, NY  10038
> 212/981.6534[o] | 917/859.1503[c] | 815/642.0223[f]
> 
> 
> ***********************************************************************
> 
> This message is for the named person's use only.  It may contain
> confidential, proprietary or legally privileged information. No right to
> confidential or privileged treatment of this message is waived or lost
> by any error in transmission.  If you have received this message in
> error, please immediately notify the sender by e-mail or by telephone at
> 212 981 6540, delete the message and all copies from your system and
> destroy any hard copies.  You must not, directly or indirectly, use,
> disclose, distribute, print or copy any part of this message if you are
> not the intended recipient.
> 
> ************************************************************************
> 
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>