[OpenAFS] Volume off-line after DB upgrade and VLDB lists wrong location

Staffan Hämälä sh@ltu.se
Sun, 25 Nov 2012 08:10:14 +0100


After an upgrade of our DB servers to 1.6.1 (not sure if related to the 
problem), and OS updates and reboots of all file servers yesterday, I 
noticed that one volume has a strange error that I've not seen before. 
The fileservers are all dafs 1.6.1 (upgraded a few months ago).

One volume appears off-line, and a vos exa doesn't even find any 
information:

vos exa staff.xyz
Could not fetch the information about volume 537689471 from the server
: No such device
Volume does not exist on server afsfs1.its.ltu.se as indicated by the VLDB

Dump only information from VLDB

staff.xyz
     RWrite: 537689471     Backup: 537689473
     number of sites -> 1
        server afsfs1.its.ltu.se partition /vicepc RW Site


I happened to have a record of the location of the volume, where I saw 
that the volume used to be in partition a, not c.

On the file server, I've checked:
$ ls -l /vicepc/*537689471*
-rw-r--r-- 1 root root 76 21 okt 19.19 /vicepc/V0537689471.vol
$ ls -l /vicepa/*537689471*
-rw-r--r-- 1 root root 76 21 okt 16.26 /vicepa/V0537689471.vol


vos listvol afsfs1 a gives:
staff.xyz                      537689471 RW  151580897 K Off-line


vos listvol afsfs1 c gives this error:
**** Could not attach volume 537689471 ****


vos listvldb staff.xyz -s afsfs1 -p a lists partition c:

staff.xyz
     RWrite: 537689471     Backup: 537689473
     number of sites -> 1
        server afsfs1.its.ltu.se partition /vicepc RW Site


FileLog on the server has these errors:
Sat Nov 24 16:02:26 2012 Warning: Duplicate volume id 537689471 detected.
Sat Nov 24 16:34:06 2012 Volume 537689471 offline: not in service
(repeated lots of times)


What should we do about this? Will a vos remove of the volume on 
partition c make it find the correct one, on partition a, instead?

/Staffan