[OpenAFS] Re: vos remove / vos zap failure observations

Joe Buehler jbuehler@spirentcom.com
Wed, 07 Jun 2006 11:34:45 -0400

Derrick J Brashear wrote:

> It only removes what it knows about. If it doesn't believe those files
> should be removed (for instance because more than one volume in the
> volume group is believed to exist) then it won't remove it.

I didn't understand that -- what's a volume group in this context?

I was under the impression that there is a 1 to 1 mapping between
volume ID and top-level directory name under /vicep*, so that
when I say "vos remove" or "vos zap" it seems sensible that
the tree should be clobbered.  There is of course the possibility
that the same volume ID exists for multiple partitions, but
that's not the case I am talking about.  I'm just talking about
the case where the code knows which server/partition/ID it is
trying to remove/zap.

> It's not always broken, you have crap on disk, and so you have crap on
> disk. Take a fresh partition, create a volume, use it, delete it. I bet
> it goes away completely.

No, it's not broken in normal usage.  The problem is that it is easy enough
to break it, but there is no easy way to fix it.

The triggering data are easy to create in various failure conditions, so
it's not like this is some sort of weird one in a million year boundary
condition.  There should be an easy way to clean up such a situation,
and if there's not, then the AFS admin commands are just plain broken
in my opinion.

My question remains: would blowing away the disk tree associated
with the vos remove/vos zap target volume be a solution to this problem?

The only reason I can see for this not being desirable is that there is
some state in a server process that would end up in an inconsistent state.
Presumably this would only happen if the vos remove/vos zap fails, in which
case a fix is still needed but is more complex than just blowing away
"unneeded" disk data.
Joe Buehler