[OpenAFS] Solaris 10 11/06 afs 1.4.2 pam module panic.

Kris Kasner tkasner@qualcomm.com
Mon, 18 Dec 2006 19:00:20 -0800 (PST)


A few of your suggestions caused yet more panics (see below), again with 
different stack traces.. booting kmdb did allow afs to come up and I panicked 
the system.. I have a kernel debugger prompt waiting for suggestions of things 
to look at.. Stack trace for that panic is at the end.

I can also poke around in any of the previous dumps.. I have complete cores for 
them all..

Thanks again for taking the time!

--Kris

Today at 20:42, Marcus Watts <mdw@umich.edu> wrote:

> Some more interesting experiments.
> How about:
> 	pagsh		setpag

Just running pagsh was enough to panic the system.

> /usr/afsws/bin/pagsh
panic[cpu0]/thread=30001714960: BAD TRAP: type=34 rp=2a1009a9830 addr=1b 
mmu_fsr=0

pagsh: alignment error:
addr=0x1b
pid=587, pc=0x1041224, sp=0x2a1009a90d1, tstate=0x1606, context=0x483
g1-g7: 17, 2, 0, 136, 135, 0, 30001714960

000002a1009a9550 unix:die+9c (34, 2a1009a9830, 1b, 0, 2a1009a9610, c1e00000)
   %l0-3: 00000000c0800000 0000000000000034 0000000000000000 0000000000010000
   %l4-7: fffffffffffffffc 0000000000000001 00000000ffbff998 0000000001076000
000002a1009a9630 unix:trap+690 (2a1009a9830, 10009, 0, 80000f, 0, 30001714960)
   %l0-3: 0000000000000000 00000600006aafb0 0000000000000034 0000060002d03530
   %l4-7: 0000000000000000 0000000000000000 000000000000f000 0000000000010200
000002a1009a9780 unix:ktl0+48 (1b, 30001714960, 18a5400, 60001fdb9a0, 0, 4)
   %l0-3: 0000000000000007 0000000000001400 0000000000001606 000000000101aa04
   %l4-7: fffffffffffffffc 0000000000000001 0000000000000000 000002a1009a9830
000002a1009a98d0 genunix:zone_cred_hold+c (1b, 60001fdbc3c, ffffffffffffffff, 
fffffffffffffffc, 3, 25)
   %l0-3: 0000000088000000 0000060001fdb9d0 0000000000000001 000000000000001a
   %l4-7: fffffffffffffffb 0000000000000005 0000000000000000 00000000000001f8
000002a1009a9980 genunix:crcopy_to+28 (60001fdb980, 60001fdbb90, 1, 0, 0, 
1853c00)
   %l0-3: 00000000ff3f0f68 0000000000000005 0000000000000001 0000000000000000
   %l4-7: 00000000000151d4 000000000001653c 0000000000000008 00000000ff18f0d8
000002a1009a9a30 genunix:setuid+16c (4002, 51bf4, 4002, ffbff878, 60001fdb980, 
18aa798)
   %l0-3: 0000060001fdbb90 0000000000000000 00000600006aafb0 00000600006aafc8
   %l4-7: 0000000000000000 0000000000000000 0000000000000000 0000000000000000



> 	klog		get k4 tickets via ka, settoken
> 		?
> This should be a close duplicate of what pam_afs does.
> or
> 	pagsh		setpag
> 	kinit		get k5 tickets
> 	aklog		settoken
> 		?
> This isn't quite as close to what pam_afs does, and
> it gets k5 tickets which might behave in interesting
> different ways.
>
> Or this:
> 	sh
> 	klog -setpag

This triggered the panic:
> sh
$ /usr/afsws/bin/klog -setpag
Password:
panic[cpu0]/thread=3000181e020: BAD TRAP: type=34 rp=2a100a2d090 addr=133 
mmu_fsr=0

klog: alignment error:
addr=0x133
pid=1887, pc=0x10b3d10, sp=0x2a100a2c931, tstate=0x1607, context=0xe7e
g1-g7: 600006fb440, 0, c76a7e12, 0, 45, 0, 3000181e020

000002a100a2cdb0 unix:die+9c (34, 2a100a2d090, 133, 0, 2a100a2ce70, c1e00000)
   %l0-3: 00000000c0800000 0000000000000034 0000000000000000 0000000000000010
   %l4-7: 0000060004d9dae8 0000000000000006 0000060004d8b7a0 0000000001076000
000002a100a2ce90 unix:trap+690 (2a100a2d090, 10009, 0, 80000b, 0, 3000181e020)
   %l0-3: 0000000000000000 0000060004cac038 0000000000000034 0000060004ce0710
   %l4-7: 0000000000000000 0000000000000124 00000000012e110c 0000000000010200
000002a100a2cfe0 unix:ktl0+48 (6000368ac70, ffffffffffffffff, 0, 2, c76a7e0c, 
3)
   %l0-3: 0000000000000000 0000000000001400 0000000000001607 000000000101aa04
   %l4-7: 0000000000000002 0000000000000108 0000000000000000 000002a100a2d090
000002a100a2d130 ip:udp_send_data+248 (60004e27c00, 60004d99dd8, 300000de280, 
600006fb4d0, 60004b43d00, 60004b43cf0)
   %l0-3: 0000000000000000 0000000000000000 0000000000000000 0000060003611080
   %l4-7: 00000000e0000000 00000000c0000000 0000000000000001 00000000000001f8
000002a100a2d230 ip:udp_output_v4+558 (60003611080, 0, 7002fc00, 600006fb4d0, 
0, 2a100a2d4ec)
   %l0-3: 00000300000de280 000000007002dc00 0000060004e27c00 0000000000001540
   %l4-7: 0000000070033000 0000000004000000 0000000000000080 0000000000000020
000002a100a2d330 ip:udp_output+474 (60003611080, 300000de280, 60004d9dae8, 10, 
1, 10)
   %l0-3: 0000000000000010 0000000000000002 0000000000000002 0000060004e27c00
   %l4-7: 0000000000000000 0000000000000108 0000000000000000 0000060004d99dd8
000002a100a2d4f0 ip:___const_seg_900003702+6050 (60003611080, 300000de280, 
60004d9dae8, 0, 60004e27c00, 0)
   %l0-3: 0000000000000010 0000000000000001 000000000000003c 00000300000de290
   %l4-7: 0000000000000124 00000600006fb4ec 0000000000000001 0000000000000000
000002a100a2d5a0 sockfs:sodgram_direct+bc (60004ecdac8, 60004d9dae8, 10, 
2a100a2d8c0, 300000de280, 0)
   %l0-3: 0000000000000000 0000000000000124 00000000018a6c00 0000060004cac038
   %l4-7: 00000000000f4240 0000000000000000 0000000000000000 0000060004df6bf0
000002a100a2d680 sockfs:sotpi_sendmsg+454 (60004ecdac8, 2a100a2da70, 
2a100a2d8c0, 0, 1200060, 0)
   %l0-3: 0000000000000000 0000060004ecdae8 0000000000000000 0000000000000010
   %l4-7: 0000060004d9dae8 0000000000000006 0000060004d8b7a0 0000000000000008
000002a100a2d740 sockfs:sendit+134 (4, 2a100a2da70, 2a100a2d8c0, 60004d9dae8, 
60004ecdac8, 0)
   %l0-3: 0000000000000001 0000000000000000 0000000000000000 000000000009a1ac
   %l4-7: 000000000008a164 0000000000000124 00000000012e110c 00000000018e83c0
000002a100a2d810 sockfs:sendmsg+294 (4, 124, 100000, 2a100a2d918, 2a100a2d8f0, 
0)
   %l0-3: 0000000000000008 0000000000000002 00000000000b6724 0000000000000108
   %l4-7: 0000000000000002 0000000000000108 00000000ffbfbf00 0000000000000010



> 		?
> This is particularly tricky; it should cause the equivalent
> to "pagsh" to happen in the parent.  I suppose at any point
> I'm suspicious of setpag, if only because you don't mention
> it and I can't think what else might be different between
> just klog and what pam does.
>
> These two parameters may alter pam operation in interesting ways:
> 	use_klog

I tried this with the sudo pam line.. still panicked the system..


> 	refresh_token
> "use_klog" causes pam to invoke klog instead of calling
> 	ka_UserAuthenticateGeneral
> this "shouldn't" make a difference, but maybe it does.
>
> "refresh_token" causes pam to not do setpag.  This is the
> moral equivalent of omitting "pagsh" or "-setpag" from the
> above experiments.
>
> It would be interesting to figure out how to run "truss"
> on your errant su / pam interaction, but I'm not sure that
> the interesting part at the very end will get printed
> before the system panics.
>
> The callback traces that you posted change; I'm guessing
> most of that isn't relevant to the actual panic.  I'm not
> positive that this is so.  If you've got some way to attach
> a kernel debugger once it crashes, there is definitely
> more to be learned.

panic[cpu0]/thread=300016fe9a0: BAD TRAP: type=34 rp=2a100a958b0 addr=33 
mmu_fsr=0

sudo: alignment error:
addr=0x33
pid=577, pc=0x10b3cb0, sp=0x2a100a95151, tstate=0x80001602, context=0x477
g1-g7: 33, 33, 0, 198, 0, 0, 300016fe9a0

000002a100a955d0 unix:die+9c (34, 2a100a958b0, 33, 0, 2a100a95690, c1e00000)
   %l0-3: 00000000c0800000 0000000000000034 0000000000000000 00000300000715f0
   %l4-7: 0000030000071640 000000000000000d 0000000000000001 0000000001076000
000002a100a956b0 unix:trap+690 (2a100a958b0, 10009, 0, 80000b, 0, 300016fe9a0)
   %l0-3: 0000000000000000 00000600006c57e0 0000000000000034 000006000514ae20
   %l4-7: 0000000000000000 0000000000000000 000000000000f000 0000000000010200
000002a100a95800 unix:ktl0+48 (60003ebbda0, 0, 242, 33, 33, 3)
   %l0-3: 0000000000000003 0000000000001400 0000000080001602 000000000101aa04
   %l4-7: 0000060000830000 0000000000000cc0 0000000000000000 000002a100a958b0
000002a100a95950 genunix:getproc+11c (2a100a95ad8, 0, 600006c57e0, 60005035bc0, 
600006c57e0, 1837400)
   %l0-3: 0000060003ebbda0 00000000018a5c00 0000000000000000 ffffffffffffffff
   %l4-7: 0000060005035bd8 0000060005035fd0 0000000000000242 0000000000000000
000002a100a95a00 genunix:cfork+94 (0, 1, 0, 1, 600006c57e0, 0)
   %l0-3: 0000000000000000 0000000000000000 00000000b8680000 000000000000b868
   %l4-7: 0000000000000001 0000000000000000 0000000000000000 0000000000000000

panic: entering debugger (continue to save dump)

Welcome to kmdb
kmdb: unable to determine terminal type: assuming `vt100'
Loaded modules: [ zfs ]
[0]>

Any suggestions on what to look for?