[OpenAFS] Re: Salvageserver 1.6.1-3+deb7u1 core dump

Andrew Deason adeason@sinenomine.net
Thu, 19 Jun 2014 13:18:08 -0500

On Thu, 19 Jun 2014 19:05:02 +0200 (CEST)
Harald Barth <haba@kth.se> wrote:

> > All directories are supposed to have a .. entry by this point
> I disagree. Not if something has removed "..". For example a buggy
> salvager like 1.6.1.

If the directory on disk is missing a ".." entry for any reason, the
salvager is supposed to detect that and rebuild the directory, adding a
".." entry.  If we are not doing that, something is wrong in the
processing, possibly also leading to any number of additional problems.

I already mentioned this, how DirOK should fail and CopyAndSalvage will
fix the dir. I do not know how to make this more clear. (I could be
wrong, of course, if you think this conclusion is incorrect.)

> Or bitrot. Or whatever. As this is a salvage program, that should not
> core dump even if unexpected things happen, I think it is very very
> very hard to justify such a assert in production code for the
> salvager. Especially because the next layer that does call the
> salvager does not handle that the salvager asserts.

Well, maybe. I'm not necessarily saying to keep the assert; just that
removing it must not be the _only_ thing you do. There is another bug
here (probably), and ignoring it will just make it worse.

Aside from that, I think there is an argument to be made for not just
ignoring an ENOENT error there. If there is any unexpected inconsistency
in the salvager that we ignore, we risk giving corrupted data to users.
Without an option to let the user decide one way or the other, the
danger of serving corrupted data is much more grave than the danger of
keeping a volume offline. I'm not sure if I would argue against ignoring
that error in this particular case, but that side of the argument needs
to not be lost.

Andrew Deason