[OpenAFS] Re: 1.6.0-pre2 ptserver/vlserver dumping core

Andrew Deason adeason@sinenomine.net
Wed, 2 Mar 2011 12:19:12 -0600


On Wed, 2 Mar 2011 00:04:50 -0600
Andrew Deason <adeason@sinenomine.net> wrote:

> Looking at it a bit more... one thing that seems odd is that we don't
> ever seem to cancel the GrowMTU event. Shouldn't we be doing that in
> FreeCall/ResetCall/EndCall somewhere? It seems like we could have some
> other event go the CheckCall->FreeCall->DestroyConn route while the
> GrowMTUEvent is still pending, and when the GrowMTUEvent fires, it
> follows the same path and frees the conn again. That wouldn't be a
> problem in the pthreaded case because we check the call refs before
> freeing in CheckCall.

I'm inferring Derrick agrees with this, from gerrit 4108 :). Ryan, if
you would like to try this patch:
<http://git.openafs.org/?p=openafs.git;a=commitdiff_plain;h=f82277b98404bc35a28e4d9ae2d084e37b3f9d7c>
(it will apply with some line offsets) It would be nice to see if that
solves the issue.

> I still wouldn't understand how you can reproduce this so easily,
> though, when I am unable. We can probably give you some gdb
> breakpoints and stuff to run, to see what events are triggering for
> the conn. If it comes to that, anyway, and you're willing try running
> it again under gdb until the problem recurs (but that's apparently not
> a very long time, heh).

This is still curious, though. If you want to, I'd be interested in you
running vlserver or ptserver under gdb without the above patch, first.
Attach soon after it starts up, and run:

set height 0
break rxi_DestroyConnection
commands
print *conn
print conn
bt
cont
end
break rxi_CleanupConnection
commands
print *conn
print conn
bt
cont
end
break rxi_FreeCall
commands
print *call
print call
bt
cont
end
cont

And when it crashes, just save the output (along with the 'bt' of the
crash), and put it somewhere we can get to it.

-- 
Andrew Deason
adeason@sinenomine.net