[OpenAFS-devel] Re: kauth broken - problem with IOMGR in lwp?
Ben Huntsman
ben@huntsmans.net
Thu, 14 Nov 2024 00:19:45 +0000
--_000_BYAPR07MB587902A276B8312871245D4DA75B2BYAPR07MB5879namp_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Hi there!
I've been looking through the commit logs for the files in src/lwp to se=
e if there is any indication as to what introduced this issue, but there's =
not very much. The old Linux server I have participating in the old kauth =
cell is running OpenAFS 1.6.13, so that's ~circa July 2015. Therefore the =
changes would have to be between then and now. While I'm building this on =
AIX, another person I know had the same problem on Linux, so it is not a pl=
atform-specific issue.
Any idea how to track it down? This is made more difficult by the limit=
ed debugger output.
Thank you!
-Ben
________________________________
From: Benjamin Kaduk <kaduk@mit.edu>
Sent: Monday, November 11, 2024 8:21 PM
To: Ben Huntsman <ben@huntsmans.net>
Cc: openafs-devel@openafs.org <openafs-devel@openafs.org>
Subject: Re: [OpenAFS-devel] Re: kauth broken - problem with IOMGR in lwp?
That makes it sound like someone put some code with side effects inside an
assertion statement, so that it gets compiled out in non-debug builds.
In the, um, more maintained parts of the tree we are using opr_Assert()
vs opr_Verify() to indicate things that do not or do always need to be
executed, but it looks like kauth has not gotten such treatment.
Which does make one wonder just how long it's been broken in this way...
-Ben
On Tue, Nov 12, 2024 at 01:58:51AM +0000, Ben Huntsman wrote:
> In continuing to research this, I see there's a lot of interesting code i=
n lwp that can be enabled by defining DEBUG. So just to give it a whirl, I=
added a line to lwp.h:
>
> #define DEBUG 1
>
> And recompiled just lwp and kauth. Now the resulting klog works. Very b=
izarre... I'm not sure what to make of it yet.
>
> -Ben
>
> ________________________________
> From: openafs-devel-admin@openafs.org <openafs-devel-admin@openafs.org> o=
n behalf of Ben Huntsman <ben@huntsmans.net>
> Sent: Monday, November 11, 2024 12:26 PM
> To: openafs-devel@openafs.org <openafs-devel@openafs.org>
> Subject: [OpenAFS-devel] kauth broken - problem with IOMGR in lwp?
>
> Hi everyone-
> First of all, please don't laugh, but I do have an older test cell tha=
t runs kauth instead of krb5. This is at home, not for anything production=
, so don't worry.
>
> That being said, is kauth currently broken? A colleague of mine tried=
it on Linux and gets a segfault when running klog, and I get the same beha=
vior on AIX:
>
> $ dbx /usr/afs/bin/klog core
> Type 'help' for help.
> [using memory image in core]
> reading symbolic information ...
>
> Segmentation fault in unnamed block in IOMGR at line 362 in file "iomgr.c=
"
> 362 FD_ZERO(&IOMGR_writefds);
> (dbx) where
> libdebug assertion "(framep->getGpr(STKP, &addr) =3D=3D DB_SUCCESS && *ne=
xtStkpp =3D=3D addr)" failed at line 1418 in file ../../../../../../../../.=
./../../src/bos/usr/ccs/lib/libdbx/libdebug/modules/stackdebug/POWER/stackd=
b_FrameProgress.C
> unnamed block in IOMGR(dummy =3D (nil)), line 362 in "iomgr.c"
> (dbx)
>
> And here's a blurb from around that line in src/lwp/iomgr.c:
> ...
> /* These are not declared in IOMGR so that they don't use up 6K of stack.=
*/
> static fd_set IOMGR_readfds, IOMGR_writefds, IOMGR_exceptfds;
> static int IOMGR_nfds =3D 0;
>
> static void *IOMGR(void *dummy)
> {
> for (;;) {
> int code;
> struct TM_Elem *earliest;
> struct timeval timeout, junk;
> bool woke_someone;
>
> FD_ZERO(&IOMGR_readfds);
> FD_ZERO(&IOMGR_writefds);
> FD_ZERO(&IOMGR_exceptfds);
> IOMGR_nfds =3D 0;
> ...
>
>
> I did the compile with the ./configure --enable-debug and -enable-debu=
g-lwp options specified. Can someone help explain how this code works, and=
what might done to fix it? I'm a little fuzzy on the *IOMGR piece and I d=
on't see that anyone calls IOMGR() directly in the code...
>
> Thank you!
>
> -Ben
>
--_000_BYAPR07MB587902A276B8312871245D4DA75B2BYAPR07MB5879namp_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
<html>
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
>
<style type=3D"text/css" style=3D"display:none;"> P {margin-top:0;margin-bo=
ttom:0;} </style>
</head>
<body dir=3D"ltr">
<div class=3D"elementToProof" style=3D"font-family: Aptos, Aptos_EmbeddedFo=
nt, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; c=
olor: rgb(0, 0, 0);">
Hi there!</div>
<div class=3D"elementToProof" style=3D"font-family: Aptos, Aptos_EmbeddedFo=
nt, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; c=
olor: rgb(0, 0, 0);">
I've been looking through the commit logs for the files in src=
/lwp to see if there is any indication as to what introduced this issue, bu=
t there's not very much. The old Linux server I have participating in=
the old kauth cell is running OpenAFS 1.6.13,
so that's ~circa July 2015. Therefore the changes would have to be b=
etween then and now. While I'm building this on AIX, another person I=
know had the same problem on Linux, so it is not a platform-specific issue=
.</div>
<div class=3D"elementToProof" style=3D"font-family: Aptos, Aptos_EmbeddedFo=
nt, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; c=
olor: rgb(0, 0, 0);">
<br>
</div>
<div class=3D"elementToProof" style=3D"font-family: Aptos, Aptos_EmbeddedFo=
nt, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; c=
olor: rgb(0, 0, 0);">
Any idea how to track it down? This is made more difficu=
lt by the limited debugger output.</div>
<div class=3D"elementToProof" style=3D"font-family: Aptos, Aptos_EmbeddedFo=
nt, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; c=
olor: rgb(0, 0, 0);">
<br>
</div>
<div class=3D"elementToProof" style=3D"font-family: Aptos, Aptos_EmbeddedFo=
nt, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; c=
olor: rgb(0, 0, 0);">
Thank you!</div>
<div class=3D"elementToProof" style=3D"font-family: Aptos, Aptos_EmbeddedFo=
nt, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; c=
olor: rgb(0, 0, 0);">
<br>
</div>
<div class=3D"elementToProof" style=3D"font-family: Aptos, Aptos_EmbeddedFo=
nt, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; c=
olor: rgb(0, 0, 0);">
-Ben</div>
<div class=3D"elementToProof" style=3D"font-family: Aptos, Aptos_EmbeddedFo=
nt, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; c=
olor: rgb(0, 0, 0);">
<br>
</div>
<div id=3D"appendonsend"></div>
<hr style=3D"display:inline-block;width:98%" tabindex=3D"-1">
<div id=3D"divRplyFwdMsg" dir=3D"ltr"><font face=3D"Calibri, sans-serif" st=
yle=3D"font-size:11pt" color=3D"#000000"><b>From:</b> Benjamin Kaduk <ka=
duk@mit.edu><br>
<b>Sent:</b> Monday, November 11, 2024 8:21 PM<br>
<b>To:</b> Ben Huntsman <ben@huntsmans.net><br>
<b>Cc:</b> openafs-devel@openafs.org <openafs-devel@openafs.org><br>
<b>Subject:</b> Re: [OpenAFS-devel] Re: kauth broken - problem with IOMGR i=
n lwp?</font>
<div> </div>
</div>
<div class=3D"BodyFragment"><font size=3D"2"><span style=3D"font-size:11pt;=
">
<div class=3D"PlainText">That makes it sound like someone put some code wit=
h side effects inside an<br>
assertion statement, so that it gets compiled out in non-debug builds.<br>
In the, um, more maintained parts of the tree we are using opr_Assert()<br>
vs opr_Verify() to indicate things that do not or do always need to be<br>
executed, but it looks like kauth has not gotten such treatment.<br>
Which does make one wonder just how long it's been broken in this way...<br=
>
<br>
-Ben<br>
<br>
On Tue, Nov 12, 2024 at 01:58:51AM +0000, Ben Huntsman wrote:<br>
> In continuing to research this, I see there's a lot of interesting cod=
e in lwp that can be enabled by defining DEBUG. So just to give it a =
whirl, I added a line to lwp.h:<br>
> <br>
> #define DEBUG 1<br>
> <br>
> And recompiled just lwp and kauth. Now the resulting klog works.=
Very bizarre... I'm not sure what to make of it yet.<br>
> <br>
> -Ben<br>
> <br>
> ________________________________<br>
> From: openafs-devel-admin@openafs.org <openafs-devel-admin@openafs.=
org> on behalf of Ben Huntsman <ben@huntsmans.net><br>
> Sent: Monday, November 11, 2024 12:26 PM<br>
> To: openafs-devel@openafs.org <openafs-devel@openafs.org><br>
> Subject: [OpenAFS-devel] kauth broken - problem with IOMGR in lwp?<br>
> <br>
> Hi everyone-<br>
> First of all, please don't laugh, but I do have an o=
lder test cell that runs kauth instead of krb5. This is at home, not =
for anything production, so don't worry.<br>
> <br>
> That being said, is kauth currently broken? A =
colleague of mine tried it on Linux and gets a segfault when running klog, =
and I get the same behavior on AIX:<br>
> <br>
> $ dbx /usr/afs/bin/klog core<br>
> Type 'help' for help.<br>
> [using memory image in core]<br>
> reading symbolic information ...<br>
> <br>
> Segmentation fault in unnamed block in IOMGR at line 362 in file "=
;iomgr.c"<br>
> 362 &=
nbsp; FD_ZERO(&IOMGR_writefds);<br>
> (dbx) where<br>
> libdebug assertion "(framep->getGpr(STKP, &addr) =3D=3D DB=
_SUCCESS && *nextStkpp =3D=3D addr)" failed at line 1418 in fi=
le ../../../../../../../../../../../src/bos/usr/ccs/lib/libdbx/libdebug/mod=
ules/stackdebug/POWER/stackdb_FrameProgress.C<br>
> unnamed block in IOMGR(dummy =3D (nil)), line 362 in "iomgr.c&quo=
t;<br>
> (dbx)<br>
> <br>
> And here's a blurb from around that line in src/lwp/iomgr.c:<br>
> ...<br>
> /* These are not declared in IOMGR so that they don't use up 6K of sta=
ck. */<br>
> static fd_set IOMGR_readfds, IOMGR_writefds, IOMGR_exceptfds;<br>
> static int IOMGR_nfds =3D 0;<br>
> <br>
> static void *IOMGR(void *dummy)<br>
> {<br>
> for (;;) {<br>
> int code;<br>
> struct TM_Elem *earlie=
st;<br>
> struct timeval timeout=
, junk;<br>
> bool woke_someone;<br>
> <br>
> FD_ZERO(&IOMGR_rea=
dfds);<br>
> FD_ZERO(&IOMGR_wri=
tefds);<br>
> FD_ZERO(&IOMGR_exc=
eptfds);<br>
> IOMGR_nfds =3D 0;<br>
> ...<br>
> <br>
> <br>
> I did the compile with the ./configure --enable-debu=
g and -enable-debug-lwp options specified. Can someone help explain h=
ow this code works, and what might done to fix it? I'm a little fuzzy=
on the *IOMGR piece and I don't see that anyone calls IOMGR()
directly in the code...<br>
> <br>
> Thank you!<br>
> <br>
> -Ben<br>
> <br>
</div>
</span></font></div>
</body>
</html>
--_000_BYAPR07MB587902A276B8312871245D4DA75B2BYAPR07MB5879namp_--