[OpenAFS] Re: 1.6.0-pre2 ptserver/vlserver dumping core

Andrew Deason adeason@sinenomine.net
Tue, 1 Mar 2011 10:15:56 -0600


On Tue, 1 Mar 2011 00:22:42 -0600
"Ryan C. Underwood" <nemesis-lists@icequake.net> wrote:

> How does a smart hacker (unlike myself) usually debug OpenAFS server
> binaries?  Would I have to flatten the threading by i.e. disabling LWP
> at compile time?

Debuggers still work... (although they won't see non-active threads).
And dbserver processes are the only significant things running LWP. TBH,
dbserver processes tend to be the last place I see in-memory corruption
or double frees, etc, since the other daemon processes get hit so much
harder.

If you can get this to happen under the pthreaded version of the
vlserver/ptserver, that may actually be a lot more helpful. I fear that
the problem will suddenly go away when you do that, though. To get them,
configure with --enable-pthreaded-ubik. You can make sure by 'ldd'ing
the vlserver/ptserver and see if it links with libpthread.

> I will try to wrap these readably.  It doesn't seem very useful
> though.  I think the actual memory corruption has already happened
> somewhere else by this point.

Well, the call and conn structures you give below are valid. So, we're
just trying to free the conn twice; seeing the state of the call can
maybe help see why. I need to look at it a bit more in detail, though.

> > LWP is going to confuse valgrind, so you get a lot of not-useful
> > stuff.  If you can get it to crash while under valgrind it might
> > output something helpful, but as I recall, most of what you get is
> > trash. But if we can look at it, we may be able to find something.
> 
> I did post the logs of a ptserver and a vlserver crashing under
> valgrind but they went to moderation.

Yeah, don't post stuff like that to the list. afs, pastebin, webserver,
etc.

-- 
Andrew Deason
adeason@sinenomine.net