[OpenAFS] Re: Unable to delete volume

Andrew Deason adeason@sinenomine.net
Mon, 8 Aug 2011 13:46:59 -0500


On Mon, 1 Aug 2011 11:33:49 -0400
Aaron Knister <aaronk@umbc.edu> wrote:

> sigh...the vldb clearly is unhappy. I'll send you a copy of the
> vldb.DB0

To follow up on this a bit: the issue here was a little different than
the direction Derrick and I were going in, as the vldb was actually
corrupt (as in, not in a way you can get by just issuing "vos" commands,
I think). We perhaps should log this case in the vlserver, as it
represents database corruption and it's hard to tell that that is the
case from the "vos" error messages (we just see "volume does not
exist").

This is an example of what can "break" due to errors reported by
vldb_check. I thought people might want to see that, since we talked
about it at the workshop. In this case, running vldb_check shows (among
a bunch of other stuff):

VLDB_CHECK_WARNING: VLDB entry 'g.ilin22.local' has no RW volume
23193: Volume 'g.ilin22.local' not found in name hash 2214 (next 20655 next in chain)
23193: Record 3432572 (type 0x7071) not in a name chain

That first one isn't necessarily corruption, but just a strange
condition (you _can_ have RO volumes without RWs, but most of the time
that's not what you want). "Record 3432572" is one of the g.ilin22.local
entries. In this particular case there are two different vl entries for
the name g.ilin22.local, but one of them is not anywhere in the name
hash table. The vlserver refuses to delete an entry that does not exist
on the name hash table, so the entry is undeletable. This can only be
fixed by 'vldb_check -fix', or by altering the database by hand or
something.

It doesn't really have much to do with the fact that the volume is
specified on a server that doesn't exist, or that there's another vl
entry with the same name, though of course that made things more
confusing.

-- 
Andrew Deason
adeason@sinenomine.net