[OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

McKee, Shawn smckee@umich.edu
Thu, 19 Mar 2009 08:40:40 -0400

Hi Everyone,

I am having a problem trying to 'vos move' volumes after losing/restoring a=
n AFS file server.   The server that was lost has been restored on new hard=
ware.  The old RW volumes were "moved" to other servers (convertROtoRW) and=
 now I want to use the 'vos move' command to move them back.

Here is what happens (I have tokens as 'admin'.  Linat07 is the current RW =
home for OSGWN and Linat08 is the new server):

vos move OSGWN linat07 /vicepf linat08 /vicepg -verbose
Starting transaction on source volume 536874901 ... done
Allocating new volume id for clone of volume 536874901 ... done
Cloning source volume 536874901 ... done
Ending the transaction on the source volume 536874901 ... done
Starting transaction on the cloned volume 2681864210 ...
Failed to start a transaction on the cloned volume2681864210
   Volume not attached, does not exist, or not on line
vos move: operation interrupted, cleanup in progress...
clear transaction contexts
Recovery: Releasing VLDB lock on volume 536874901 ... done
Recovery: Accessing VLDB.
move incomplete - attempt cleanup of target partition - no guarantee
Recovery: Creating transaction for destination volume 536874901 ...
Recovery: Unable to start transaction on destination volume 536874901.
Recovery: Creating transaction on source volume 536874901 ... done
Recovery: Setting flags on source volume 536874901 ... done
Recovery: Ending transaction on source volume 536874901 ... done
Recovery: Creating transaction on clone volume 2681864210 ...
Recovery: Unable to start transaction on source volume 536874901.
Recovery: Releasing lock on VLDB entry for volume 536874901 ... done
cleanup complete - user verify desired result
[linat08:local]# vos examine  2681864210
Could not fetch the entry for volume number 18446744072096448530 from VLDB

I am assuming the "large" cloned volume ID is causing the problem as oppose=
d to an inability to create a cloned volume.  I can make replicas on linat0=
8 for existing volumes without a problem.

NOTE: The hex representations of the "cloned" volume from the move attempt =
above and the 'vos examine':

[linat08:local]# 2681864210 =3D 0x 9FDA0012
[linat08:local]# 18446744072096448530 =3D 0x FFFFFFFF9FDA0012

Any suggestions?   This seems like a 64 vs 32 bit issue.

Here is the information on servers and versions:

We have 3 AFS DB servers:
  Linat02 - RHEL5/x86_64  -  OpenAFS 1.4.7
  Linat03 - RHEL4/i686    -  OpenAFS 1.4.6
  Linat04 - RHEL5/x86_64  -  OpenAFS 1.4.7

We have 3 AFS file servers:
  Linat06 - RHEL4/x86_64  -  OpenAFS 1.4.6
  Linat07 - RHEL4/x86_64  -  OpenAFS 1.4.6
  Linat08 - RHEL5/x86_64  -  OpenAFS 1.4.8

Info on OSGWN volume:

[linat08:~]# vos examine OSGWN
OSGWN                             536874901 RW     505153 K  On-line
    linat07.grid.umich.edu /vicepf
    RWrite  536874901 ROnly 18446744072096448530 Backup          0
    MaxQuota    2000000 K
    Creation    Tue Mar  3 03:43:06 2009
    Copy        Mon Dec  3 16:39:21 2007
    Backup      Never
    Last Update Sat Feb 21 15:18:05 2009
    0 accesses in the past day (i.e., vnode references)

    RWrite: 536874901     ROnly: 536874902
    number of sites -> 2
       server linat07.grid.umich.edu partition /vicepf RW Site
       server linat06.grid.umich.edu partition /vicepe RO Site

Let me know if there is other info required to help resolve this.


Shawn McKee
University of Michigan/ATLAS Group