[OpenAFS] (minor) VLDB corruption - how to clean up/repair?
Jan Iven
jan.iven@cern.ch
Fri, 7 Nov 2014 16:40:51 +0100
While trying to clean up leftover server IPs, I've come across a
corrupted entry in our VLDB. We've apparently had this for at least
several months (potentially much longer), seems to not really affect
running (and happily got sync'ed over to new servers etc recently).
$ vos changeaddr -oldaddr 137.138.144.30 -remove
Could not remove server 137.138.144.30 from the VLDB
VLDB: volume Id exists in the vldb
$ vos listvldb -server 137.138.144.30
VLDB entries for server 137.138.144.30
Total entries: 0
#### hm?
$ vos delentry -server 137.138.144.30
Deleting VLDB entries for server 137.138.144.30
Could not delete VLDB entry for user.nau
VLDB: no such entry
----------------------
Total VLDB entries deleted: 0; failed to delete: 1
#### OK, we got a volume name
$ vos exa user.nau
VLDB: no such entry
#### ?!
Then ran "vldb_check" on an offline copy of the DB - several
unhappinesses, not all shown here (guess we should run this more often):
Check Volume Name Hash
address 919236 (offset 0xe0704): Name Hash 922: volume name 'user.nau':
Incorrect name hash chain (should be in 1461)
[..]
Verify each volume entry
address 919236 (offset 0xe0704): Volume 'user.nau' id 537418007 also
found on other chains (0x8f0f1)
Corresponding entry is
address 919236 (offset 0xe0704): vlentry user.nau
rw id = 537418007 ; ro id = 537418008 ; bk id = 537418009
flags = rw bk
LockAfsId = 0
LockTimestamp = 0
cloneId = 0
next hash for rw = 2222228 ; ro = 2222228 ; bk = 2222228 ; name =
1159736
server 0 ; partition 0 ; flags =
server 0 ; partition 0 ; flags =
server 0 ; partition 0 ; flags =
server 0 ; partition 0 ; flags =
server 0 ; partition 0 ; flags =
server 0 ; partition 0 ; flags =
server 0 ; partition 0 ; flags =
server 0 ; partition 0 ; flags =
server 0 ; partition 0 ; flags =
server 0 ; partition 0 ; flags =
server 0 ; partition 0 ; flags =
server 0 ; partition 0 ; flags =
server 0 ; partition 0 ; flags =
###
So, corrupted entry, trying to recover. The account is gone (for a
while, as far as I can tell, so no worries about data loss)
$ vos exa 537418007
Volume 537418007 does not exist in VLDB
Dump only information from VLDB
user.nau
RWrite: 537418007 Backup: 537418009
number of sites -> 13
server 137.138.144.30 partition /vicepa RO Site
server 137.138.144.30 partition /vicepa RO Site
server 137.138.144.30 partition /vicepa RO Site
server 137.138.144.30 partition /vicepa RO Site
server 137.138.144.30 partition /vicepa RO Site
server 137.138.144.30 partition /vicepa RO Site
server 137.138.144.30 partition /vicepa RO Site
server 137.138.144.30 partition /vicepa RO Site
server 137.138.144.30 partition /vicepa RO Site
server 137.138.144.30 partition /vicepa RO Site
server 137.138.144.30 partition /vicepa RO Site
server 137.138.144.30 partition /vicepa RO Site
server 137.138.144.30 partition /vicepa RO Site
$ vos remove -id 537418007 -verbose
VLDB: Volume '537418007' no match
$ vos delentry -id 537418007 -verbose
Could not delete entry for volume 537418007
You must specify a RW volume name or ID (the entire VLDB entry will be
deleted)
VLDB: no such entry
Deleted 0 VLDB entries
Are there any other "just do it"-level commands or flags that might
allow me to get unstuck via command line tools, or is this potentially
"hexedit time" (in which case I might be inclined to let sleeping dogs
rest a while)?
Many thanks in advance,
jan