[OpenAFS] Volume corruption or salvage error?

Harald Barth haba@kth.se
Mon, 12 Oct 2015 11:54:07 +0200 (CEST)


I have a very odd thing about at least one volume. The volume is ancient:

# vos exa home.levitte.Mail
home.levitte.Mail                 536904039 RW     530871 K  On-line
    beef.stacken.kth.se /vicepa 
    RWrite  536904039 ROnly          0 Backup  536905115 
    MaxQuota     550000 K 
    Creation    Sun Mar  6 05:07:47 2011
    Copy        Sat Jul 18 16:22:24 2015
    Backup      Mon Oct 12 01:25:24 2015
    Last Access Thu Oct  8 10:10:38 2015
    Last Update Wed Apr 17 14:27:05 2002
    0 accesses in the past day (i.e., vnode references)

    RWrite: 536904039     Backup: 536905115 
    number of sites -> 1
       server beef.stacken.kth.se partition /vicepa RW Site 

There does not seem to be any change going on from $USER. The volume
was salvaged (routine salvage after a vos move) where some errors
were corrected in August:

SalsrvLog.2015-08-17.10:39:09:07/18/2015 16:24:27 dispatching child to salvage volume 536904039...
SalsrvLog.2015-08-17.10:39:09:07/18/2015 16:24:29 SALVAGING VOLUME 536904039.
SalsrvLog.2015-08-17.10:39:09:07/18/2015 16:24:29 home.levitte.Mail (536904039) updated 04/17/2002 14:27
SalsrvLog.2015-08-17.10:39:09:07/18/2015 16:24:30 Salvaged home.levitte.Mail (536904039): 100258 files, 530871 blocks

Since then, the volume has been cloned to a backup volume very often without error:

VolserLog:Mon Aug 24 01:24:39 2015 1 Volser: Clone: Recloning volume 536904039 to volume 536905115
VolserLog:Tue Aug 25 01:23:11 2015 1 Volser: Clone: Recloning volume 536904039 to volume 536905115
VolserLog:Wed Aug 26 01:24:53 2015 1 Volser: Clone: Recloning volume 536904039 to volume 536905115
VolserLog:Thu Aug 27 01:22:00 2015 1 Volser: Clone: Recloning volume 536904039 to volume 536905115
VolserLog:Fri Aug 28 01:24:22 2015 1 Volser: Clone: Recloning volume 536904039 to volume 536905115
...
VolserLog:Sat Oct 10 01:25:23 2015 1 Volser: Clone: Recloning volume 536904039 to volume 536905115
VolserLog:Sun Oct 11 01:25:28 2015 1 Volser: Clone: Recloning volume 536904039 to volume 536905115
VolserLog:Mon Oct 12 01:25:24 2015 1 Volser: Clone: Recloning volume 536904039 to volume 536905115

However, when the backup volume should be dumped for backup:

Wed Sep  9 01:25:05 2015 DumpVnode: volume 536905115 vnode 1 has inconsistent length (index 6144 disk 26624); aborting dump
Sun Oct 11 01:25:28 2015 DumpVnode: volume 536905115 vnode 1 has inconsistent length (index 6144 disk 26624); aborting dump

The rw clone can not be dumped either:

Mon Oct 12 11:20:31 2015 DumpVnode: volume 536904039 vnode 1 has inconsistent length (index 6144 disk 26624); aborting dump

So something is wrong in spite that salvage was "successful" and that
the volume header says that the volume was not changed since the
salvage. Has the salvage "forgotten" to update the size of vnode 1
where it created entries for orphaned files (according to the
salvagelog)?

I have two other volumes that are similar :-(

The data is on zfs which has not reported any problems with data loss.

Version is 1.6.9-2+deb8u2-debian (ok, a bit old) but I have not seen
anything in the release notes that this is known or fixed.

Harald.