[OpenAFS] AFS Fileserver Won't Start --> Can't Release root.cell or root.afs
Karl M. Davis
karl@ridgetop-group.com
Thu, 4 Oct 2007 17:14:53 -0700
> Try vos release -id root.afs -verbose -local as root to get more info
> and use your KeyFile instead of user tokens.Running that command gives me:
<<
karl@picacho:~$ sudo vos release -id root.afs -verbose -localauth
root.afs
RWrite: 536870915 ROnly: 536870916 RClone: 536870916
number of sites -> 4
server picacho.ridgetop-group.local partition /vicepa RW Site -- New
release
server picacho.ridgetop-group.local partition /vicepa RO Site -- New
release
server picacho.ridgetop-group.local partition /vicepa RO Site -- Old
release
server coronado.ridgetop-group.local partition /vicepa RO Site --
New release
This is a completion of a previous release
Starting transaction on cloned volume 536870916... done
Failed to start a transaction on the RO volume.
VOLSER: volume is busy
The volume 536870915 could not be released to the following 1 sites:
picacho.ridgetop-group.local /vicepa
VOLSER: release could not be completed
Error in vos release command.
VOLSER: release could not be completed
karl@picacho:~$
>>
> Does vos listaddrs -noresolve print out?
Nope, nothing from that, either.
> You might need to vos remsite replicas attached to those IPs first.
Interesting... I tried this but instead of specifying picacho, I passed it
127.0.0.1 as the machine name and it worked. It actually worked twice for
each. Here's the output:
<<
karl@picacho:~$ sudo vos remsite -server 127.0.0.1 -partition a -id root.afs
-localauth -verbose
Deleting the replication site for volume 536870915 ... done
Removed replication site 127.0.0.1 /vicepa for volume root.afs
karl@picacho:~$ sudo vos remsite -server 127.0.0.1 -partition a -id root.afs
-localauth -verbose
Deleting the replication site for volume 536870915 ... done
Removed replication site 127.0.0.1 /vicepa for volume root.afs
karl@picacho:~$ sudo vos remsite -server 127.0.0.1 -partition a -id root.afs
-localauth -verbose
This site is not a replication site
Error in vos remsite command.
VOLSER: illegal operation
karl@picacho:~$ vos listvldb
vsu_ClientInit: Could not get afs tokens, running unauthenticated.
VLDB entries for all servers
lib
RWrite: 536870933 ROnly: 536870934
number of sites -> 2
server picacho.ridgetop-group.local partition /vicepa RW Site
server coronado.ridgetop-group.local partition /vicepa RO Site
lib.pdks
RWrite: 536870936 ROnly: 536870937
number of sites -> 2
server picacho.ridgetop-group.local partition /vicepa RW Site
server coronado.ridgetop-group.local partition /vicepa RO Site
root.afs
RWrite: 536870915 ROnly: 536870916 RClone: 536870916
number of sites -> 2
server picacho.ridgetop-group.local partition /vicepa RW Site -- New
release
server coronado.ridgetop-group.local partition /vicepa RO Site --
New release
root.cell
RWrite: 536870918 ROnly: 536870919 RClone: 536870919
number of sites -> 4
server picacho.ridgetop-group.local partition /vicepa RW Site -- New
release
server picacho.ridgetop-group.local partition /vicepa RO Site -- New
release
server picacho.ridgetop-group.local partition /vicepa RO Site -- Old
release
server coronado.ridgetop-group.local partition /vicepa RO Site --
New release
Total entries: 4
karl@picacho:~$ sudo vos remsite -server 127.0.0.1 -partition a -id
root.cell -localauth -verbose
Deleting the replication site for volume 536870918 ... done
Removed replication site 127.0.0.1 /vicepa for volume root.cell
karl@picacho:~$ sudo vos remsite -server 127.0.0.1 -partition a -id
root.cell -localauth -verbose
Deleting the replication site for volume 536870918 ... done
Removed replication site 127.0.0.1 /vicepa for volume root.cell
karl@picacho:~$ sudo vos remsite -server 127.0.0.1 -partition a -id
root.cell -localauth -verbose
This site is not a replication site
Error in vos remsite command.
VOLSER: illegal operation
karl@picacho:~$ vos listvldb
vsu_ClientInit: Could not get afs tokens, running unauthenticated.
VLDB entries for all servers
lib
RWrite: 536870933 ROnly: 536870934
number of sites -> 2
server picacho.ridgetop-group.local partition /vicepa RW Site
server coronado.ridgetop-group.local partition /vicepa RO Site
lib.pdks
RWrite: 536870936 ROnly: 536870937
number of sites -> 2
server picacho.ridgetop-group.local partition /vicepa RW Site
server coronado.ridgetop-group.local partition /vicepa RO Site
root.afs
RWrite: 536870915 ROnly: 536870916 RClone: 536870916
number of sites -> 2
server picacho.ridgetop-group.local partition /vicepa RW Site -- New
release
server coronado.ridgetop-group.local partition /vicepa RO Site --
New release
root.cell
RWrite: 536870918 ROnly: 536870919 RClone: 536870919
number of sites -> 2
server picacho.ridgetop-group.local partition /vicepa RW Site -- New
release
server coronado.ridgetop-group.local partition /vicepa RO Site --
New release
Total entries: 4
>>
After that, I tried "vos release" again. Here's the output from that:
<<
karl@picacho:~$ sudo vos release -id root.cell -verbose -localauth
root.cell
RWrite: 536870918 ROnly: 536870919 RClone: 536870919
number of sites -> 2
server picacho.ridgetop-group.local partition /vicepa RW Site -- New
release
server coronado.ridgetop-group.local partition /vicepa RO Site --
New release
This is a complete release of volume 536870918
Cloning RW volume 536870918 to temporary RO... done
Getting status of RW volume 536870918... done
Ending cloning transaction on RW volume 536870918... done
Starting transaction on cloned volume 536870919... done
Updating existing ro volume 536870919 on coronado.ridgetop-group.local ...
Starting ForwardMulti from 536870919 to 536870919 on
coronado.ridgetop-group.local (as of Thu Sep 6 12:58:39 2007).
Deleting the releaseClone 536870919 ... done
updating VLDB ... done
Released volume root.cell successfully
karl@picacho:~$ vos listvldb
vsu_ClientInit: Could not get afs tokens, running unauthenticated.
VLDB entries for all servers
lib
RWrite: 536870933 ROnly: 536870934
number of sites -> 2
server picacho.ridgetop-group.local partition /vicepa RW Site
server coronado.ridgetop-group.local partition /vicepa RO Site
lib.pdks
RWrite: 536870936 ROnly: 536870937
number of sites -> 2
server picacho.ridgetop-group.local partition /vicepa RW Site
server coronado.ridgetop-group.local partition /vicepa RO Site
root.afs
RWrite: 536870915 ROnly: 536870916 RClone: 536870916
number of sites -> 2
server picacho.ridgetop-group.local partition /vicepa RW Site -- New
release
server coronado.ridgetop-group.local partition /vicepa RO Site --
New release
root.cell
RWrite: 536870918 ROnly: 536870919
number of sites -> 2
server picacho.ridgetop-group.local partition /vicepa RW Site
server coronado.ridgetop-group.local partition /vicepa RO Site
Total entries: 4
karl@picacho:~$ sudo vos release -id root.afs -verbose -localauth
root.afs
RWrite: 536870915 ROnly: 536870916 RClone: 536870916
number of sites -> 2
server picacho.ridgetop-group.local partition /vicepa RW Site -- New
release
server coronado.ridgetop-group.local partition /vicepa RO Site --
New release
This is a complete release of volume 536870915
Cloning RW volume 536870915 to temporary RO... done
Getting status of RW volume 536870915... done
Ending cloning transaction on RW volume 536870915... done
Starting transaction on cloned volume 536870916... done
Updating existing ro volume 536870916 on coronado.ridgetop-group.local ...
Starting ForwardMulti from 536870916 to 536870916 on
coronado.ridgetop-group.local (as of Thu Aug 9 13:13:33 2007).
Deleting the releaseClone 536870916 ... done
updating VLDB ... done
Released volume root.afs successfully
karl@picacho:~$ vos listvldb
vsu_ClientInit: Could not get afs tokens, running unauthenticated.
VLDB entries for all servers
lib
RWrite: 536870933 ROnly: 536870934
number of sites -> 2
server picacho.ridgetop-group.local partition /vicepa RW Site
server coronado.ridgetop-group.local partition /vicepa RO Site
lib.pdks
RWrite: 536870936 ROnly: 536870937
number of sites -> 2
server picacho.ridgetop-group.local partition /vicepa RW Site
server coronado.ridgetop-group.local partition /vicepa RO Site
root.afs
RWrite: 536870915 ROnly: 536870916
number of sites -> 2
server picacho.ridgetop-group.local partition /vicepa RW Site
server coronado.ridgetop-group.local partition /vicepa RO Site
root.cell
RWrite: 536870918 ROnly: 536870919
number of sites -> 2
server picacho.ridgetop-group.local partition /vicepa RW Site
server coronado.ridgetop-group.local partition /vicepa RO Site
Total entries: 4
>>
I then tried running "vos changeaddr -oldaddr 127.0.0.1 -remove", but it
looks like some of my volumes are still "stuck" on the old IP:
<<
karl@picacho:~$ sudo vos changeaddr -oldaddr 127.0.0.1 -remove -localauth
-verbose
Could not remove server 127.0.0.1 from the VLDB
VLDB: volume Id exists in the vldb
>>
How would I go about resolving this? By the way, thanks very much for all
of your help so far; you've really saved my ass on this.
Thanks,
Karl
-----Original Message-----
From: openafs-info-admin@openafs.org [mailto:openafs-info-admin@openafs.org]
On Behalf Of Christopher D. Clausen
Sent: Thursday, October 04, 2007 6:46 AM
To: Karl M. Davis
Cc: openafs-info@openafs.org
Subject: Re: [OpenAFS] AFS Fileserver Won't Start --> Can't Release
root.cell or root.afs
Karl M. Davis <karl@ridgetop-group.com> wrote:
> Well, after rebooting again, things suddenly seem to be working. No
> idea why...
>
> I still have some problems with making RO copies of root.cell and
> root.afs, though. Running "vos release" gives me:
> <<
> karl@picacho:~$ vos release -id root.cell
> Failed to start a transaction on the RO volume.
> VOLSER: volume is busy
> The volume 536870918 could not be released to the following 1 sites:
> picacho.ridgetop-group.local /vicepa
> VOLSER: release could not be completed
> Error in vos release command.
> VOLSER: release could not be completed
> karl@picacho:~$ vos release -id root.afs
> Failed to start a transaction on the RO volume.
> VOLSER: volume is busy
> The volume 536870915 could not be released to the following 1 sites:
> picacho.ridgetop-group.local /vicepa
> VOLSER: release could not be completed
> Error in vos release command.
> VOLSER: release could not be completed
Try vos release -id root.afs -verbose -local as root to get more info
and use your KeyFile instead of user tokens.
Does vos listaddrs -noresolve print out?
And can you vos changeaddr -remove any incorrect IP addresses? (You
might need to vos remsite replicas attached to those IPs first.)
You might still be having problems related to having your 127.*
/etc/hosts line match the actual IP of your AFS server. In theory you
can shutdown both AFS servers, delete your VL DB and have it regenerated
via vos syncserv and vos syncvldb commands. Of course, this could also
make things worse.
<<CDC
_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info