[OpenAFS] Weird Quorum Issues

Aaron Stanley astanley@strozllc.com
Wed, 05 Nov 2003 21:50:09 -0500


I was away from my cluster for two days and when I got back I noticed some
very odd behavior with volumes locking for replication and not getting
unlocked like normal.  When I would issue a vos unlock for the volumes that
were locked I got strange errors like:

u: No Quorum Elected
Error in vos unlock command

If I continued to re-run the command, eventually the volume would unlock.  I
noticed this behavior with vos create, vos release, and vos backup as well.

I ran a udebug on all three of my vl servers for both ports 7002 and 7004.
On the primary (largest) fileserver, the output would waffle between the
normal "I am sync site" and the not normal "I am not sync site".  I thought
at first that it might be a network issue, but pings between servers was
great and bandwith was not an issue.

At this point, I don't know what could be causing the primary server to,
sort of, lose its quorum every second or so, so I'm hoping someone on the
list might be able to point in a direction.  I didn't notice anything
strange in the logs, but perhaps I'm looking in the wrong place so any hints
there would be appreciated.

I tried culling the list archives for a similar problem report, but I
couldn't find anybody describing a situation where sometimes the quorum was
present, but then a second or two later it was gone.

I truly appreciate any ideas and/or help.  Thanks!

 - AB


-- 
Aaron Stanley
Director, Information Technology
Stroz Friedberg, LLC
15 Maiden Lane, 12th Floor
New York, NY  10038
212/981.6534[o] | 917/859.1503[c] | 815/642.0223[f]


***********************************************************************

This message is for the named person's use only.  It may contain
confidential, proprietary or legally privileged information. No right to
confidential or privileged treatment of this message is waived or lost
by any error in transmission.  If you have received this message in
error, please immediately notify the sender by e-mail or by telephone at
212 981 6540, delete the message and all copies from your system and
destroy any hard copies.  You must not, directly or indirectly, use,
disclose, distribute, print or copy any part of this message if you are
not the intended recipient.

************************************************************************