[OpenAFS] (minor) VLDB corruption - how to clean up/repair?

Jan Iven jan.iven@cern.ch
Fri, 7 Nov 2014 16:40:51 +0100


While trying to clean up leftover server IPs, I've come across a 
corrupted entry in our VLDB. We've apparently had this for at least 
several months (potentially much longer), seems to not really affect 
running (and happily got sync'ed over to new servers etc recently).


$ vos changeaddr -oldaddr 137.138.144.30 -remove
Could not remove server 137.138.144.30 from the VLDB
VLDB: volume Id exists in the vldb
$ vos listvldb -server 137.138.144.30
VLDB entries for server 137.138.144.30

Total entries: 0
#### hm?

$ vos delentry -server 137.138.144.30
Deleting VLDB entries for server 137.138.144.30
Could not delete VLDB entry for  user.nau
VLDB: no such entry
----------------------
Total VLDB entries deleted: 0; failed to delete: 1
#### OK, we got a volume name

$ vos exa user.nau
VLDB: no such entry

#### ?!
Then ran "vldb_check" on an offline copy of the DB - several 
unhappinesses, not all shown here (guess we should run this more often):


Check Volume Name Hash
address 919236 (offset 0xe0704): Name Hash 922: volume name 'user.nau': 
Incorrect name hash chain (should be in 1461)
[..]
Verify each volume entry
address 919236 (offset 0xe0704): Volume 'user.nau' id 537418007 also 
found on other chains (0x8f0f1)

Corresponding entry is

address 919236 (offset 0xe0704): vlentry user.nau
    rw id = 537418007 ; ro id = 537418008 ; bk id = 537418009
    flags         = rw bk
    LockAfsId     = 0
    LockTimestamp = 0
    cloneId       = 0
    next hash for rw = 2222228 ; ro = 2222228 ; bk = 2222228 ; name = 
1159736
    server 0 ; partition 0 ; flags =
    server 0 ; partition 0 ; flags =
    server 0 ; partition 0 ; flags =
    server 0 ; partition 0 ; flags =
    server 0 ; partition 0 ; flags =
    server 0 ; partition 0 ; flags =
    server 0 ; partition 0 ; flags =
    server 0 ; partition 0 ; flags =
    server 0 ; partition 0 ; flags =
    server 0 ; partition 0 ; flags =
    server 0 ; partition 0 ; flags =
    server 0 ; partition 0 ; flags =
    server 0 ; partition 0 ; flags =


###
So, corrupted entry, trying to recover. The account is gone (for a 
while, as far as I can tell, so no worries about data loss)

$ vos exa 537418007
Volume 537418007 does not exist in VLDB

Dump only information from VLDB

user.nau
     RWrite: 537418007     Backup: 537418009
     number of sites -> 13
        server 137.138.144.30 partition /vicepa RO Site
        server 137.138.144.30 partition /vicepa RO Site
        server 137.138.144.30 partition /vicepa RO Site
        server 137.138.144.30 partition /vicepa RO Site
        server 137.138.144.30 partition /vicepa RO Site
        server 137.138.144.30 partition /vicepa RO Site
        server 137.138.144.30 partition /vicepa RO Site
        server 137.138.144.30 partition /vicepa RO Site
        server 137.138.144.30 partition /vicepa RO Site
        server 137.138.144.30 partition /vicepa RO Site
        server 137.138.144.30 partition /vicepa RO Site
        server 137.138.144.30 partition /vicepa RO Site
        server 137.138.144.30 partition /vicepa RO Site

$ vos remove -id 537418007 -verbose
VLDB: Volume '537418007' no match
$ vos delentry -id 537418007  -verbose
Could not delete entry for volume 537418007
You must specify a RW volume name or ID (the entire VLDB entry will be 
deleted)
VLDB: no such entry
Deleted 0 VLDB entries


Are there any other "just do it"-level commands or flags that might 
allow me to get unstuck via command line tools, or is this potentially 
"hexedit time" (in which case I might be inclined to let sleeping dogs 
rest a while)?

Many thanks in advance,
jan