[OpenAFS] VLDB corruption

Michael Meffie mmeffie@sinenomine.net
Sat, 8 Nov 2014 13:29:06 -0500


On Sat, 8 Nov 2014 10:52:35 -0500
Michael Meffie <mmeffie@sinenomine.net> wrote:

> On Sat, 8 Nov 2014 10:58:21 +0200
> Kostas Liakakis <kostas@physics.auth.gr> wrote:
> 
> > Hello,
> > 
> > Reading about the recent thread for VLDB corruption I decided to take a 
> > look at ours, again. vldb_check gives me about 3000 entries likes this:
> > 
> > address 1477640 (offset 0x168c48): Free vlentry not on free chain
> > 
> > which -fix doesn't seem able to fix.
> > 
> > We had several vldb corruption issues in our cell, all caused by a 
> > misbehaving 1.4.something server which is now fortunatelly retired. We 
> > were able to repair the mess with vldb_check on every occation but this 
> > one. We are now running 1.6.10-2 everywhere but we still can't get rid 
> > of this 1.4.x herritage...
> > 
> > Should we be worried about these errors? There doesn't seem to be a 
> > problem so far.
> 
> Hello Kostas,
> 
> I think I see the issue here. The vldb_check -fix does rebuild the
> volume lookup hash tables, but does not rebuild the free list. The
> free list is the list of free slots in the database (holes), which
> the vlserver reuses when allocating new records. If they are not
> in the free list, then the vlserver will just not reuse them, making
> your vldb file larger then needed.
> 
> I'm working on a patch to fix this.

Hello Kostas,

A small fix for vldb_check -fix is in gerrit at http://gerrit.openafs.org/11598

As Andrew mentioned, would you be willing to share a copy of your vldb.DB0 file
to check?

Thanks,
Mike

-- 
Michael Meffie <mmeffie@sinenomine.net>