[OpenAFS] Re: 1.6.2 buserver + butc

Andrew Deason adeason@sinenomine.net
Mon, 15 Apr 2013 15:20:33 -0500


This is a multi-part message in MIME format.

--Multipart=_Mon__15_Apr_2013_15_20_33_-0500_4t19TcGX7UyEuJO4
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Thu, 28 Mar 2013 16:38:55 -0500
Andrew Deason <adeason@sinenomine.net> wrote:

> What I was after is the stack trace of all of the LWPs in the buserver
> process. You cannot get at those easily, since LWP is a threading
> system that is not understood by the debugger (dbx or gdb). That's
> kinda why I was treating the 'core file' option as something where you
> give the core file to a developer. Getting that information by
> providing instructions to you makes this a bit more difficult... but
> is probably doable.

So, while I was waiting for some stuff to compile while trying this, I
realized this might be fixed by
<http://git.openafs.org/?p=openafs.git;a=patch;h=dce2d8206ecd35c96e75cc0662432c2a4f9c3d7a>.
I'm not clear on what exactly the principal is for, but that does fix a
bug that was introduced in the 1.6 series. Since there have not been
many substantial changes to budb in general, and that change impacts the
CreateDump function, that seems like a likely culprit. (To devs: the
original change doesn't make a lot of sense to me; the commit messages
suggest there are different strutures in play, but the args and function
parameters are all ktc_principal.)

I'm not sure how that would cause it to hang, but if you want to try a
patch, you can try that one. If you want to look at the core you
captured before, then read on. I would still be interested in seeing a
stack trace, even if the above patch appears to fix the issue.

> is probably doable. I've only done that a couple of times before, and
> it involved hex editing the core file; it may be a bit easier with
> Solaris dbx, but I'll need a little time to look into it.

This actually isn't so bad if you rely on mdb to give you the stack
traces. Attached a dbx script that can be used to get some traces. This
should probably live in a repo or something... somewhere. Do people have
an opinion on where this should go?

Anyway, you can use it like this. If you compiled with LWP debug turned
on, it's more likely to work (this means running ./configure with
--enable-debug-lwp), but it's not required. Run:

$ /opt/SUNWspro/bin/dbx /path/to/buserver /path/to/core
[...]
(dbx) source lwpstacks.ksh
(dbx) lwpstacks

If you don't have LWP debug, this will fail (probably with something
like "dbx: struct "lwp_pcb" is not defined[...]"). You can try running
this without using debug symbols (we'll guess at where some data is), by
running this instead:

(dbx) lwpstacks nodebug

With the script as-is, the 'nodebug' stuff seems to work with OpenAFS
1.6.2 on Solaris 10 SPARC, but it may need fiddling to work anywhere
else.

If either of those works, you'll see something like:

(dbx) lwpstacks nodebug
!# NOT using debug symbols
!# looking for threads in blocked 
::echo stack pointer for thread 14a530: 1562d8
0x001562d8::stack 0 ! sed 's/^/  /'
::echo
::echo stack pointer for thread 180cf8: 18caa0
0x0018caa0::stack 0 ! sed 's/^/  /'
[...]

To get actual stack traces out of that, pipe the output through mdb:

(dbx) lwpstacks nodebug | mdb /path/to/buserver /path/to/core
stack pointer for thread 14a530: 1562d8
  LWP_WaitProcess+0x38()
  rxi_Sleep+4()
  rx_GetCall+0x320()
  rxi_ServerProc+0x40()
  rx_ServerProc+0x74()
  Create_Process_Part2+0x40()
  0x68388()
  ubik_ServerInitCommon+0x23c()

stack pointer for thread 180cf8: 18caa0
  LWP_WaitProcess+0x38()
[...]

This output is similar enough to mdb ::findstack output that it will
work with David Powell's "munges" script if you have that. But it's also
pretty useful just by itself.

Surprisingly, that doesn't require any manual core editing. mdb I think
is the only debugger I've used that lets you get stack trace information
from arbitrary context (at least, I haven't seen an easy way for gdb or
dbx to do this), but the way state is stored on solaris on sparc
probably helps make that easier.

If you want to provide such stack output from the core you captured, it
may say what's going on.

-- 
Andrew Deason
adeason@sinenomine.net

--Multipart=_Mon__15_Apr_2013_15_20_33_-0500_4t19TcGX7UyEuJO4
Content-Type: application/octet-stream;
 name="lwpstacks.ksh"
Content-Disposition: attachment;
 filename="lwpstacks.ksh"
Content-Transfer-Encoding: base64

IyBsd3BzdGFja3Mua3NoCiMKIyBTb2xhcmlzIGRieCBrc2ggc2NyaXB0IGZvciBnZXR0aW5nIHN0
YWNrIHRyYWNlcyBmcm9tIE9wZW5BRlMgTFdQIHByb2Nlc3NlcwojIHVzYWdlOgojIChkYngpIHNv
dXJjZSBsd3BzdGFja3Mua3NoCiMgKGRieCkgbHdwc3RhY2tzCiMgKGRieCkgbHdwc3RhY2tzIHwg
bWRiIC9wYXRoL3RvL2JpbmFyeSAvcGF0aC90by9jb3JlCiMKIyBpZiB5b3UgZG9uJ3QgaGF2ZSBM
V1AgZGVidWcgc3ltYm9scywgdGhhdCB3b24ndCB3b3JrLiBpbnN0ZWFkIHJ1bjoKIwojIChkYngp
IGx3cHN0YWNrcyB8IG1kYiAvcGF0aC90by9iaW5hcnkgL3BhdGgvdG8vY29yZQojCiMgdG8gZ3Vl
c3MgYXQgc29tZSBzdHJ1Y3R1cmVzIHdpdGggaGFyZC1jb2RlZCBhZGRyZXNzIG9mZnNldHMuIHRo
YXQgd29uJ3QKIyBhbHdheXMgd29yaywgYW5kIHdpbGwgcHJvYmFibHkgbmVlZCB0byBiZSB1cGRh
dGVkIGlmIHN0cnVjdHVyZXMgY2hhbmdlIGluCiMgT3BlbkFGUy4KIwojIGFsd2F5cyB0cnkgcnVu
bmluZyBsd3BzdGFja3Mgd2l0aG91dCBwaXBpbmcgdGhyb3VnaCBtZGIgdGhlIGZpcnN0IHRpbWUs
IHRvCiMgbWFrZSBzdXJlIGl0IHJ1bnMgd2l0aG91dCBlcnJvcnMuIHdlIGRvbid0IGV4YWN0bHkg
aGF2ZSB2ZXJ5IHJvYnVzdCBlcnJvcgojIGhhbmRsaW5nIGhlcmUuCgoKIyBvdXRwdXQ6IHNldHMg
c2hlbGwgdmFycyAkaGVhZCBhbmQgJHBjYgpnZXRoZWFkKCkgewoJZWNobyAiISMgbG9va2luZyBm
b3IgdGhyZWFkcyBpbiAkMSAkMiIKCWlmICRub2RlYnVnIDsgdGhlbgoJCWlmIFsgIngkMiIgPSB4
IF0gOyB0aGVuCgkJCWhlYWQ9Iigodm9pZCoqKSYkMSlbMF0iCgkJZWxzZQoJCQloZWFkPSIoKHZv
aWQqKikmJDEpWyQyKjJdIgoJCWZpCgkJcGNiPSIkaGVhZCIKCWVsc2UKCQlpZiBbICJ4JDIiID0g
eCBdIDsgdGhlbgoJCQloZWFkPSQxLmhlYWQKCQllbHNlCgkJCWhlYWQ9IiQxWyQyXS5oZWFkIgoJ
CWZpCgkJcGNiPSIoKHN0cnVjdCBsd3BfcGNiKikkaGVhZCkiCglmaQp9CgojIG91dHB1dDogc2V0
cyBzaGVsbCB2YXIgJHRvcHN0YWNrCmdldHRvcHN0YWNrKCkgewoJaWYgJG5vZGVidWcgOyB0aGVu
CgkJdG9wc3RhY2s9JFsoKHZvaWQqKikkMSlbMjFdXQoJZWxzZQoJCXRvcHN0YWNrPSRbKHZvaWQq
KSQxLT5jb250ZXh0LnRvcHN0YWNrXQoJZmkKfQoKIyBvdXRwdXQ6IHNldHMgc2hlbGwgdmFyICRw
Y2IKZ2V0bmV4dCgpIHsKCWlmICRub2RlYnVnIDsgdGhlbgoJCXBjYj0kWygodm9pZCoqKSQxKVs1
MV1dCgllbHNlCgkJcGNiPSRbJHBjYi0+bmV4dF0KCQlwY2I9Iigoc3RydWN0IGx3cF9wY2IqKSRw
Y2IpIgoJZmkKfQoKcHJpbnRzdGFja3MoKSB7CglwY2I9IiQxIgoKCWlmIFsgIngkWyRwY2I9PTBd
IiA9IHgwIF0gOyB0aGVuCgkJZmlyc3Q9MQoJCXdoaWxlIFsgIngkWyRwY2I9PSRoZWFkXSIgPSB4
MCBdIHx8IFsgIngkZmlyc3QiID0geDEgXSA7IGRvCgkJCWZpcnN0PTAKCgkJCWdldHRvcHN0YWNr
ICIkcGNiIgoKCQkJcHJpbnRmICI6OmVjaG8gc3RhY2sgcG9pbnRlciBmb3IgdGhyZWFkICV4OiAl
eFxuIiAiJFskcGNiXSIgIiR0b3BzdGFjayIKCQkJcHJpbnRmICIweCUwOHg6OnN0YWNrIDAgISBz
ZWQgJ3MvXi8gIC8nXG4iICIkdG9wc3RhY2siCgkJCWVjaG8gIjo6ZWNobyIKCgkJCWdldG5leHQg
IiRwY2IiCgkJZG9uZQoJZmkKfQoKbHdwc3RhY2tzKCkgewoJaWYgWyB4IiQxIiA9IHhub2RlYnVn
IF0gOyB0aGVuCgkJbm9kZWJ1Zz10cnVlCgkJZWNobyAiISMgTk9UIHVzaW5nIGRlYnVnIHN5bWJv
bHMiCgllbHNlCgkJbm9kZWJ1Zz1mYWxzZQoJCWVjaG8gIiEjIHVzaW5nIGRlYnVnIHN5bWJvbHMi
CglmaQoJCglmb3IgaGVhZCBpbiBibG9ja2VkIHF3YWl0aW5nIDsgZG8KCQlnZXRoZWFkICIkaGVh
ZCIKCQlwcmludHN0YWNrcyAiJHBjYiIKCWRvbmUKCglmb3IgaSBpbiAwIDEgMiAzIDQgOyBkbwoJ
CWdldGhlYWQgcnVubmFibGUgIiRpIgoJCXByaW50c3RhY2tzICIkcGNiIgoJZG9uZQoKCWVjaG8K
CWVjaG8gIiEjIHRvIGdldCBzdGFjayB0cmFjZXMsIHBpcGUgdGhpcyBvdXRwdXQgdGhyb3VnaDog
bWRiIC9wYXRoL3RvL2JpbmFyeSAvcGF0aC90by9jb3JlIgoJZWNobwp9Cg==

--Multipart=_Mon__15_Apr_2013_15_20_33_-0500_4t19TcGX7UyEuJO4--