[OpenAFS] Re: 1.6.2 buserver + butc

Derrick Brashear shadow@gmail.com
Mon, 15 Apr 2013 17:11:04 -0400


--e89a8fb2017ed7a2fe04da6cb047
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Apr 15, 2013 at 4:20 PM, Andrew Deason <adeason@sinenomine.net>wrote:

> On Thu, 28 Mar 2013 16:38:55 -0500
> Andrew Deason <adeason@sinenomine.net> wrote:
>
> > What I was after is the stack trace of all of the LWPs in the buserver
> > process. You cannot get at those easily, since LWP is a threading
> > system that is not understood by the debugger (dbx or gdb). That's
> > kinda why I was treating the 'core file' option as something where you
> > give the core file to a developer. Getting that information by
> > providing instructions to you makes this a bit more difficult... but
> > is probably doable.
>
> So, while I was waiting for some stuff to compile while trying this, I
> realized this might be fixed by
> <
> http://git.openafs.org/?p=openafs.git;a=patch;h=dce2d8206ecd35c96e75cc0662432c2a4f9c3d7a
> >.
> I'm not clear on what exactly the principal is for, but that does fix a
> bug that was introduced in the 1.6 series. Since there have not been
> many substantial changes to budb in general, and that change impacts the
> CreateDump function, that seems like a likely culprit. (To devs: the
> original change doesn't make a lot of sense to me; the commit messages
> suggest there are different strutures in play, but the args and function
> parameters are all ktc_principal.)
>

*If* it's it, it would have to be a missing null termination. I looked at
that
case specifically and couldn't create an issue but it's conceivable I
missed
due to differences in the host I tested on. The only other interesting thing
was the potential for differences due to the offsetof changes, but that was
also a red herring.

>This actually isn't so bad if you rely on mdb to give you the stack

> traces. Attached a dbx script that can be used to get some traces. This
> should probably live in a repo or something... somewhere. Do people have
> an opinion on where this should go?
>
>
We have scripts from Robert Milkowski for dtrace which similarly lack a
home;
Properly they might be in their own module instead of openafs but I don't
think
it would be particularly abusive to include them here.


> Anyway, you can use it like this. If you compiled with LWP debug turned
> on, it's more likely to work (this means running ./configure with
> --enable-debug-lwp), but it's not required. Run:
>
> $ /opt/SUNWspro/bin/dbx /path/to/buserver /path/to/core
> [...]
> (dbx) source lwpstacks.ksh
> (dbx) lwpstacks
>
> If you don't have LWP debug, this will fail (probably with something
> like "dbx: struct "lwp_pcb" is not defined[...]"). You can try running
> this without using debug symbols (we'll guess at where some data is), by
> running this instead:
>
> (dbx) lwpstacks nodebug
>
> With the script as-is, the 'nodebug' stuff seems to work with OpenAFS
> 1.6.2 on Solaris 10 SPARC, but it may need fiddling to work anywhere
> else.
>
> If either of those works, you'll see something like:
>
> (dbx) lwpstacks nodebug
> !# NOT using debug symbols
> !# looking for threads in blocked
> ::echo stack pointer for thread 14a530: 1562d8
> 0x001562d8::stack 0 ! sed 's/^/  /'
> ::echo
> ::echo stack pointer for thread 180cf8: 18caa0
> 0x0018caa0::stack 0 ! sed 's/^/  /'
> [...]
>
> To get actual stack traces out of that, pipe the output through mdb:
>
> (dbx) lwpstacks nodebug | mdb /path/to/buserver /path/to/core
> stack pointer for thread 14a530: 1562d8
>   LWP_WaitProcess+0x38()
>   rxi_Sleep+4()
>   rx_GetCall+0x320()
>   rxi_ServerProc+0x40()
>   rx_ServerProc+0x74()
>   Create_Process_Part2+0x40()
>   0x68388()
>   ubik_ServerInitCommon+0x23c()
>
> stack pointer for thread 180cf8: 18caa0
>   LWP_WaitProcess+0x38()
> [...]
>
> This output is similar enough to mdb ::findstack output that it will
> work with David Powell's "munges" script if you have that. But it's also
> pretty useful just by itself.
>
> Surprisingly, that doesn't require any manual core editing. mdb I think
> is the only debugger I've used that lets you get stack trace information
> from arbitrary context (at least, I haven't seen an easy way for gdb or
> dbx to do this), but the way state is stored on solaris on sparc
> probably helps make that easier.
>
> If you want to provide such stack output from the core you captured, it
> may say what's going on.
>
>

-- 
Derrick

--e89a8fb2017ed7a2fe04da6cb047
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<br><br><div class=3D"gmail_quote">On Mon, Apr 15, 2013 at 4:20 PM, Andrew =
Deason <span dir=3D"ltr">&lt;<a href=3D"mailto:adeason@sinenomine.net" targ=
et=3D"_blank">adeason@sinenomine.net</a>&gt;</span> wrote:<br><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;=
padding-left:1ex">
<div class=3D"im">On Thu, 28 Mar 2013 16:38:55 -0500<br>
Andrew Deason &lt;<a href=3D"mailto:adeason@sinenomine.net">adeason@sinenom=
ine.net</a>&gt; wrote:<br>
<br>
&gt; What I was after is the stack trace of all of the LWPs in the buserver=
<br>
&gt; process. You cannot get at those easily, since LWP is a threading<br>
&gt; system that is not understood by the debugger (dbx or gdb). That&#39;s=
<br>
&gt; kinda why I was treating the &#39;core file&#39; option as something w=
here you<br>
&gt; give the core file to a developer. Getting that information by<br>
&gt; providing instructions to you makes this a bit more difficult... but<b=
r>
&gt; is probably doable.<br>
<br>
</div>So, while I was waiting for some stuff to compile while trying this, =
I<br>
realized this might be fixed by<br>
&lt;<a href=3D"http://git.openafs.org/?p=3Dopenafs.git;a=3Dpatch;h=3Ddce2d8=
206ecd35c96e75cc0662432c2a4f9c3d7a" target=3D"_blank">http://git.openafs.or=
g/?p=3Dopenafs.git;a=3Dpatch;h=3Ddce2d8206ecd35c96e75cc0662432c2a4f9c3d7a</=
a>&gt;.<br>

I&#39;m not clear on what exactly the principal is for, but that does fix a=
<br>
bug that was introduced in the 1.6 series. Since there have not been<br>
many substantial changes to budb in general, and that change impacts the<br=
>
CreateDump function, that seems like a likely culprit. (To devs: the<br>
original change doesn&#39;t make a lot of sense to me; the commit messages<=
br>
suggest there are different strutures in play, but the args and function<br=
>
parameters are all ktc_principal.)<br></blockquote><div><br>*If* it&#39;s i=
t, it would have to be a missing null termination. I looked at that <br>cas=
e specifically and couldn&#39;t create an issue but it&#39;s conceivable I =
missed <br>
due to differences in the host I tested on. The only other interesting thin=
g<br>was the potential for differences due to the offsetof changes, but tha=
t was<br>also a red herring.<br><br>&gt;This actually isn&#39;t so bad if y=
ou rely on mdb to give you the stack<br>
</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l=
eft:1px #ccc solid;padding-left:1ex">
traces. Attached a dbx script that can be used to get some traces. This<br>
should probably live in a repo or something... somewhere. Do people have<br=
>
an opinion on where this should go?<br>
<br></blockquote><div><br>We have scripts from Robert Milkowski for dtrace =
which similarly lack a home;<br>Properly they might be in their own module =
instead of openafs but I don&#39;t think<br>it would be particularly abusiv=
e to include them here.<br>
=A0<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;b=
order-left:1px #ccc solid;padding-left:1ex">
Anyway, you can use it like this. If you compiled with LWP debug turned<br>
on, it&#39;s more likely to work (this means running ./configure with<br>
--enable-debug-lwp), but it&#39;s not required. Run:<br>
<br>
$ /opt/SUNWspro/bin/dbx /path/to/buserver /path/to/core<br>
[...]<br>
(dbx) source lwpstacks.ksh<br>
(dbx) lwpstacks<br>
<br>
If you don&#39;t have LWP debug, this will fail (probably with something<br=
>
like &quot;dbx: struct &quot;lwp_pcb&quot; is not defined[...]&quot;). You =
can try running<br>
this without using debug symbols (we&#39;ll guess at where some data is), b=
y<br>
running this instead:<br>
<br>
(dbx) lwpstacks nodebug<br>
<br>
With the script as-is, the &#39;nodebug&#39; stuff seems to work with OpenA=
FS<br>
1.6.2 on Solaris 10 SPARC, but it may need fiddling to work anywhere<br>
else.<br>
<br>
If either of those works, you&#39;ll see something like:<br>
<br>
(dbx) lwpstacks nodebug<br>
!# NOT using debug symbols<br>
!# looking for threads in blocked<br>
::echo stack pointer for thread 14a530: 1562d8<br>
0x001562d8::stack 0 ! sed &#39;s/^/ =A0/&#39;<br>
::echo<br>
::echo stack pointer for thread 180cf8: 18caa0<br>
0x0018caa0::stack 0 ! sed &#39;s/^/ =A0/&#39;<br>
[...]<br>
<br>
To get actual stack traces out of that, pipe the output through mdb:<br>
<br>
(dbx) lwpstacks nodebug | mdb /path/to/buserver /path/to/core<br>
stack pointer for thread 14a530: 1562d8<br>
=A0 LWP_WaitProcess+0x38()<br>
=A0 rxi_Sleep+4()<br>
=A0 rx_GetCall+0x320()<br>
=A0 rxi_ServerProc+0x40()<br>
=A0 rx_ServerProc+0x74()<br>
=A0 Create_Process_Part2+0x40()<br>
=A0 0x68388()<br>
=A0 ubik_ServerInitCommon+0x23c()<br>
<br>
stack pointer for thread 180cf8: 18caa0<br>
=A0 LWP_WaitProcess+0x38()<br>
[...]<br>
<br>
This output is similar enough to mdb ::findstack output that it will<br>
work with David Powell&#39;s &quot;munges&quot; script if you have that. Bu=
t it&#39;s also<br>
pretty useful just by itself.<br>
<br>
Surprisingly, that doesn&#39;t require any manual core editing. mdb I think=
<br>
is the only debugger I&#39;ve used that lets you get stack trace informatio=
n<br>
from arbitrary context (at least, I haven&#39;t seen an easy way for gdb or=
<br>
dbx to do this), but the way state is stored on solaris on sparc<br>
probably helps make that easier.<br>
<br>
If you want to provide such stack output from the core you captured, it<br>
may say what&#39;s going on.<br>
<span class=3D"HOEnZb"></span><br></blockquote></div><br clear=3D"all"><br>=
-- <br>Derrick

--e89a8fb2017ed7a2fe04da6cb047--