[OpenAFS-devel] Re: Salvager (from openafs-server-1.6.5-1.el6.x86_64) segmentation fault

Jeffrey Hutzelman jhutz@cmu.edu
Wed, 14 Aug 2013 06:30:27 -0400


On Tue, 2013-08-13 at 10:24 -0500, Andrew Deason wrote:
> On Tue, 13 Aug 2013 11:23:07 +0200 (CEST)
> Harald Barth <haba@kth.se> wrote:
> 
> > I assume the salvager tries to delete the directory entry .. and
> > create it again new.
> > 
> > Looks to me like FindItem() in dir.c:Delete() came up empty handed, we
> > got ENOENT which did Abort().
> > 
> > Do you think it's safe to change row 3986 to something less dramatic
> > that Abort() or do you have a better suggestion?
> 
> This is possibly/probably fixed by gerrit 9104. The issue is that we
> deleted the ".." entry earlier, and we try to delete it again and
> assert. Gerrit 9104 should prevent the salvager from trying to delete it
> twice. (This needs to be submitted to 1.6, as I don't think I see it
> there anywhere.)
> 
> Regardless, I think there is an argument to be made for not asserting on
> Delete()s if the next Create() succeeds anyway, but I'm not sure. The
> reasoning for the existing code is probably that we don't want anything
> to be "weird" while salvaging, since mistakes can lose or corrupt data.
> But I'm not sure; express an opinion here if you have it :)

I agree that not asserting on Delete() failure here is probably the
right thing.  If the directory in question is corrupt, we can try to
reconstruct it, but that should be done with some care -- we don't want
to lose filenames if we can help it.

-- Jeff