[OpenAFS] Consistency problems with VLDB?

keegan ice@subloop.respond2.com
Sun, 7 Apr 2002 19:19:28 -0700


--cWoXeonUoKmBZSoM
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hello everyone,

I have been using OpenAFS for several months here, to manage a couple of
very large data archives and the local user home directory tree.  There
are 2 servers and only a handful of clients, all running Linux 2.4.18
and OpenAFS 1.2.3.  I'm having some very strange problems, which seem to
imply that the commandline tools provided are not working at all, and
that the servers are unable to manage their internal metadata with any
semblance of consistency, but I'll really be thrilled if someone here can
tell me otherwise.  I've put a lot of work into building this system and
am not looking forward to throwing it away.

I'm no longer certain of the train of events which lead to the current
state, but it didn't consist of much more than updating volumes and
adding (or attempting to add) replication sites.

'vos release home' at this point returns the following:

Failed to start a transaction on the RO volume.
VOLSER: volume is busy
Failed to start a transaction on the RO volume.
VOLSER: volume is busy
Failed to start a transaction on the RO volume.
VOLSER: volume is busy
Failed to start a transaction on the RO volume.
VOLSER: volume is busy
Failed to start a transaction on the RO volume.
VOLSER: volume is busy
The volume 536870924 could not be released to the following 6 sites:
                   rune.lan.thebasement.org /vicepa
                   rune.lan.thebasement.org /vicepa
                   rune.lan.thebasement.org /vicepa
                   rune.lan.thebasement.org /vicepa
                   rune.lan.thebasement.org /vicepa
                   rune.lan.thebasement.org /vicepa
VOLSER: release could not be completed
Error in vos release command.
VOLSER: release could not be completed

'vos examine home' returns:

home                              536870924 RW       8832 K  On-line
    rune.lan.thebasement.org /vicepa
    RWrite  536870924 ROnly          0 Backup          0
    MaxQuota     100000 K
    Creation    Sun Apr  7 18:33:33 2002
    Last Update Sun Apr  7 18:33:33 2002
    0 accesses in the past day (i.e., vnode references)

    RWrite: 536870924     ROnly: 536870925     RClone: 536870925
    number of sites -> 8
       server oxygen.thebasement.org partition /vicepa RO Site  -- New rele=
ase
       server rune.lan.thebasement.org partition /vicepa RO Site  -- New re=
lease
       server rune.lan.thebasement.org partition /vicepa RO Site  -- Old re=
lease
       server rune.lan.thebasement.org partition /vicepa RO Site  -- Old re=
lease
       server rune.lan.thebasement.org partition /vicepa RO Site  -- Old re=
lease
       server rune.lan.thebasement.org partition /vicepa RO Site  -- Old re=
lease
       server rune.lan.thebasement.org partition /vicepa RO Site  -- Old re=
lease
       server rune.lan.thebasement.org partition /vicepa RW Site  -- Old re=
lease

'vos remsite rune a home' returns:

This site is not a replication site
Error in vos remsite command.
VOLSER: illegal operation

Any attempt to read the volume, either RO or RW, now fails.  It seems very,
very strange to me, that somehow an RO volume has been marked as newer than
the RW volume, and there are -6- RO volumes on the same partition.  Is
there any situation where more than 1 RO volume per partition makes sense?
How can an RO volume possibly be newer than the RW it was cloned from?
Why does 'vos remsite' not recognize any of the 6 RO volumes?

And, most importantly, how can I get my volume working again?  I have tried
everything which I have ever heard of to alter the status of a volume with
no success at all.  All I ever get are more broken RO volumes.

I really don't want to start a flamewar or anything of the sort, but all
of this behavior seems very broken to me.  I just want a working filesystem.
If anyone can help me, even a minor suggestion, PLEASE HELP!

Thank you all,
  Keegan

--=20
Keegan Quinn  <ice@[subloop.respond2.com|thebasement.org]>
503.250.1882  http://www.thebasement.org/~ice/

PGP: 0x91F0657A  1EDE 2467 74A9 6FF0 687C  2194 44C5 2BDF 91F0 657A

--cWoXeonUoKmBZSoM
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE8sP4vRMUr35HwZXoRAmG8AKDb5rCoqUB8sH89+UEJMRbK33hgtQCgodI5
7W3Mpg7p8Vrnpbz5bYrWgEQ=
=kAfc
-----END PGP SIGNATURE-----

--cWoXeonUoKmBZSoM--