[OpenAFS] Solaris 10 11/06 afs 1.4.2 pam module panic.

Kris Kasner tkasner@qualcomm.com
Mon, 18 Dec 2006 15:52:02 -0800 (PST)

Thanks for the reply!

Today at 18:18, Marcus Watts <mdw@umich.edu> wrote:

> What happens when you klog?  Or aklog?  Does it crash then?
> Does it crash immediately upon doing klog/aklog, or upon
> the first file reference after that?

As yet I've been unable to get the system to crash with the pam entries 

I removed the pam entries, put a local password for my account in /etc/shadow, 
logged in. upon klog I can get to my homedir and everything else.. Doing lots 
of reads/writes from my afs home dir have no effect on the system.

I should have also mentioned, I'm running 1.4.2 on many solaris 10 (SPARC and 
x86) systems without issues, this is the first 11/06 release I've tried which 
uses the pam module

> Does it die if you use a cache manager from openafs 1.4.1 ?  1.2.13 ?

I can try 1.4.1 (I think I still have it lying around.. :) but I didn't think 
the 1.2.x versions played well with solaris 10 at all.. 1.4.2 has some major 
user level panic type bug fixes in it..

I added this line to /etc/pam.conf (we use this on all of our systems that auth 
against afs, works just fine..)
sudo   auth required    /usr/afsws/lib/pam_afs.so.1             ignore_root

and typed "sudo true"
when I hit enter from typing my password, I was greeted with another panic
15:46:46 ui234(27)> sudo vi /etc/pam.conf
15:47:46 ui234(28)> sudo -k
15:47:50 ui234(29)> sudo true
AFS Password:

panic[cpu0]/thread=3000162e6a0: BAD TRAP: type=34 rp=2a1002978b0 addr=33 

sudo: alignment error:
pid=1255, pc=0x10b3cb0, sp=0x2a100297151, tstate=0x80001605, context=0x9a0
g1-g7: 42, 42, 0, 210, 0, 0, 3000162e6a0

000002a1002975d0 unix:die+9c (34, 2a1002978b0, 33, 0, 2a100297690, c1e00000)
   %l0-3: 00000000c0800000 0000000000000034 0000000000000000 00000300000715f0
   %l4-7: 0000030000071640 0000000000000000 000000000000f000 0000000001076000
000002a1002976b0 unix:trap+690 (2a1002978b0, 10009, 0, 80000b, 0, 3000162e6a0)
   %l0-3: 0000000000000000 0000060004c51bc0 0000000000000034 0000060004dee710
   %l4-7: 0000000000000000 0000000000000000 000000000000f000 0000000000010200
000002a100297800 unix:ktl0+48 (6000373bcf0, 0, 4e8, 42, 42, 3)
   %l0-3: 0000000000000006 0000000000001400 0000000080001605 000000000101aa04
   %l4-7: 0000060000830000 0000000000001080 0000000000000000 000002a1002978b0
000002a100297950 genunix:getproc+11c (2a100297ad8, 0, 60004c51bc0, 60004e007b8, 
60004c51bc0, 1837400)
   %l0-3: 000006000373bcf0 00000000018a5c00 0000000000000000 ffffffffffffffff
   %l4-7: 0000060004e007d0 0000060004e00bc8 00000000000004e8 0000000000000000
000002a100297a00 genunix:cfork+94 (0, 1, 0, 1, 60004c51bc0, 0)
   %l0-3: 0000000000000000 0000000000000000 00000000ae540000 000000000000ae54
   %l4-7: 0000000000000001 0000000000000000 0000000000000000 0000000000000000

Thanks again for the input


> trap type 34 = "memory address not aligned".  Your callback looks
> like it dies somewhere while trying to send a udp datagram,
> which doesn't totally make sense.
> It's possible that some part of openafs has somehow managed
> to construct a udp datagram that contains words that are not aligned
> on word boundaries.  The only way that could be associated with
> klog/aklog would be if some part of rxkad managed to do this.
> It's possible I'm reading this wrong and something quite different
> actually happened.  My expectation is that things that could
> go wrong with klog/rxkad would happen a bit quicker.
> Note: I don't have any solaris 10 machines -- so other than
> asking questions/suggesting ways it might die I may not be able
> to help you.
> 					-Marcus Watts
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info


Thomas Kris Kasner
Qualcomm Inc.
5775 Morehouse Drive
San Diego, CA 92121

 	"There are many intelligent species in the universe.
 	They are all owned by cats."