[OpenAFS] vos convertROtoRW requires salvage ?

John Tang Boyland boyland@cs.uwm.edu
Wed, 02 Apr 2008 22:26:16 -0500


As people on the list may know, I am in the process of recovering from
complete fileserver failure (lesson: don't use inode servers with Solaris
10 x86).  In what follows, "filip" is an inode Solaris 10 x86
fileserver that cannot attach any of its volumes.  "eastside" is a namei
Solaris 10 fileserver (a PC dragged in to fill the gap when the main
fileservers failed) and "solomons" is an ancient Solaris 8
(sparc) fileserver brought out of retirement for the same reason.
"filip" was the second fileserver to fail -- after the first failed, I
brought up eastside and released several important volumes (such as
root.cell) to eastside, and thus when filip failed a week later, at
least a RO copy was available.  (I wasn't actually expecting filip to
fail since it had been working fine for two years without incident.)

I have been using the new helpful command "vos convertROtoRW" to 
convert volumes.  (BTW: thanks for the man page on openafs.org --
maybe I should point out that the "PRIVILEGE REQUIRED" section looks
like it was copied from "vos move").

The problem is that the conversion takes the volume offline requiring
a salvage.  I have been nervous about salvaging (see other messages)
but fortunately salvage works uneventfully.  I've used vos convertROtoRW
earlier on a less important volume.  In the end everything's OK,
but still I'd like to ask: is this salvage requirement a known feature?  

(In this transcript: /usr/afsws/bin is in AFS but /usr/afs/bin is on the
local disk -- thankfully! -- and the former is on my path.  And yes,
I'm running with admin tokens in a user account ON the new fileserver -- I
said this was a temporary stopgap arrangement.) 

eastside.cs 71 % vos listvldb root.cell

root.cell 
    RWrite: 536870915     ROnly: 536870916 
    number of sites -> 4
       server filip.cs.uwm.edu partition /vicepa RW Site 
       server filip.cs.uwm.edu partition /vicepa RO Site 
       server eastside.cs.uwm.edu partition /vicepa RO Site 
       server solomons.cs.uwm.edu partition /vicepa RO Site  -- Not released
eastside.cs 72 % vos convertROtoRW eastside a root.cell
VLDB indicates that a RW volume exists already on filip.cs.uwm.edu in partition /vicepa.
Overwrite this VLDB entry? [y|n] (n)
y
eastside.cs 73 % vos listvldb root.cell

root.cell 
    RWrite: 536870915     ROnly: 536870916 
    number of sites -> 3
       server solomons.cs.uwm.edu partition /vicepa RO Site  -- Not released
       server filip.cs.uwm.edu partition /vicepa RO Site 
       server eastside.cs.uwm.edu partition /vicepa RW Site 
eastside.cs 74 % vos remsite filip a root.cell
/usr/afsws/etc/vos: No such device
eastside.cs 75 % vos listvldb root.cell
/usr/afsws/etc/vos: Connection timed out
eastside.cs 76 % /usr/afs/bin/vos listvldb root.cell

root.cell 
    RWrite: 536870915     ROnly: 536870916 
    number of sites -> 3
       server solomons.cs.uwm.edu partition /vicepa RO Site  -- Not released
       server filip.cs.uwm.edu partition /vicepa RO Site 
       server eastside.cs.uwm.edu partition /vicepa RW Site 
eastside.cs 77 % /usr/afs/bin/vos remsite filip a root.cell
Deleting the replication site for volume 536870915 ...Removed replication site filip /vicepa for volume root.cell
eastside.cs 78 % /usr/afs/bin/vos listvldb root.cell

root.cell 
    RWrite: 536870915     ROnly: 536870916 
    number of sites -> 2
       server solomons.cs.uwm.edu partition /vicepa RO Site  -- Not released
       server eastside.cs.uwm.edu partition /vicepa RW Site 
eastside.cs 79 % vos release root.cell
/usr/afsws/etc/vos: Connection timed out
eastside.cs 80 % /usr/afs/bin/vos release root.cell
Failed to start transaction on volume 536870915
Volume needs to be salvaged
Error in vos release command.
Volume needs to be salvaged
eastside.cs 81 % /usr/afs/bin/vos listvldb root.cell

root.cell 
    RWrite: 536870915     ROnly: 536870916 
    number of sites -> 2
       server solomons.cs.uwm.edu partition /vicepa RO Site  -- Not released
       server eastside.cs.uwm.edu partition /vicepa RW Site 
eastside.cs 82 % /usr/afs/bin/bos salvage eastside a root.cell
Starting salvage.
bos: salvage completed
eastside.cs 83 % vos listvldb root.cell
/usr/afsws/etc/vos: Connection timed out
eastside.cs 84 % /usr/afs/bin/vos listvldb root.cell

root.cell 
    RWrite: 536870915     ROnly: 536870916 
    number of sites -> 2
       server solomons.cs.uwm.edu partition /vicepa RO Site  -- Not released
       server eastside.cs.uwm.edu partition /vicepa RW Site 
eastside.cs 85 % /usr/afs/bin/vos addsite eastside a root.cell
Added replication site eastside /vicepa for volume root.cell
eastside.cs 86 % /usr/afs/bin/vos release root.cell
Released volume root.cell successfully
eastside.cs 87 % fs checkv  
usage: /usr/openwin/bin/xfs [-config config_file] [-port tcp_port]
eastside.cs 88 % /usr/afsws/bin/fs checkv
/usr/afsws/bin/fs: Connection timed out
eastside.cs 89 % /usr/afs/bin/fs checkv
All volumeID/name mappings checked.
eastside.cs 90 % /usr/afsws/bin/fs checks
All servers are running.
eastside.cs 91 % vos listvldb root.cell

root.cell 
    RWrite: 536870915     ROnly: 536870916 
    number of sites -> 3
       server solomons.cs.uwm.edu partition /vicepa RO Site 
       server eastside.cs.uwm.edu partition /vicepa RW Site 
       server eastside.cs.uwm.edu partition /vicepa RO Site