[OpenAFS] Solaris 10 11/06 afs 1.4.2 pam module panic.

Marcus Watts mdw@umich.edu
Tue, 19 Dec 2006 02:01:00 -0500


I wrote:
> I apparently have just gotten access to a local sun-10/sparc64 box ,
> which if it works may give me more knowledge of what's going on.
> I've also just downloaded opensolaris.org's "onnv", which may
> contain kernel source (so far, I've found their site somewhat baffling.)
> I'll send you more if I learn more.

onnv does in fact contain kernel source.

It seems a "struct cred", sorry, in solaris terms, a "cred_t", is regarded
as a private structure of private internals.  afs doesn't know about this,
and accesses the insides straight.  So if those insides actually had changed,
bad things could happen.  Maybe.

Some of the places where stuff related to this happens:
/usr/src/uts/common/os/privs.awk
	set sizes of some constants which are used in
/usr/include/sys/priv_const.h
	which sets size of priv_set_t, which is the type used in
/usr/include/sys/priv_impl.h
	for .cr_priv in "struct cred" which is defined in
/usr/include/sys/cred_impl.h
	which is included by the solaris kernel in
/usr/src/uts/common/os/cred.c
	which contains source for most of the magic cr* functions.
	sys/cred_impl.h is also included by the openafs source from
src/afs/sysincludes.h
	which in turn is included by src/afs/SOLARIS/osi_groups.c

Since I don't have exactly your machine, I don't have a direct
way to verify priv_const.h changed between your kernel & the header files
you used with opeanfs.  But you can.  Here's how.

On your machine, you probably have this ELF file:
/platform/sun4u/kernel/sparcv9/genunix
(possibly in a slightly different path)
which contains inside of it object code for these funtions:

	0000000000020108 <crgetgroups>:
	   20108:       81 c3 e0 08     retl
XXXX	   2010c:       90 02 20 68     add  %o0, 0x68, %o0

	0000000000020110 <crgetngroups>:
YYYY	   20110:       da 02 20 1c     ld  [ %o0 + 0x1c ], %o5
	   20114:       81 c3 e0 08     retl
	   20118:       91 3b 60 00     sra  %o5, 0, %o0
(Use your favorite disassembler or debugger to find this code.)

On line XXXX, 0x68 is the offset of .cr_groups
On line YYYY, 0x1c is the offset of .cr_ngroups

These offsets are from this kind of machine:
	SunOS 5.11 sun4u SUNW,Sun-Fire-280R
so probably won't exactly match what you see.

When you built openafs, you probably generated a file something like
src/libafs/MODLOAD64/osi_groups.o
which will contain code something like this:
	00000000000002b0 <afs_getgroups>:
	 2b0:   9d e3 bf 10     save  %sp, -240, %sp
	 2b4:   13 00 00 00     sethi  %hi(0), %o1
				2b4: R_SPARC_HH22       afs_cmstats+0x824
	 2b8:   03 00 00 00     sethi  %hi(0), %g1
				2b8: R_SPARC_LM22       afs_cmstats+0x824
	 2bc:   92 12 60 00     mov  %o1, %o1
				2bc: R_SPARC_HM10       afs_cmstats+0x824
	 2c0:   93 2a 70 20     sllx  %o1, 0x20, %o1
	 2c4:   92 12 40 01     or  %o1, %g1, %o1
	 2c8:   d0 42 60 00     ldsw  [ %o1 ], %o0
				2c8: R_SPARC_LO10       afs_cmstats+0x824
	 2cc:   90 02 20 01     inc  %o0
	 2d0:   d0 22 60 00     st  %o0, [ %o1 ]
				2d0: R_SPARC_LO10       afs_cmstats+0x824
	 2d4:   c0 26 60 04     clr  [ %i1 + 4 ]
	 2d8:   c0 26 60 00     clr  [ %i1 ]
YYYY	 2dc:   d0 06 20 1c     ld  [ %i0 + 0x1c ], %o0
	 2e0:   b7 3a 20 00     sra  %o0, 0, %i3
	 2e4:   ba 16 c0 00     mov  %i3, %i5
XXXX	 2e8:   b8 06 20 20     add  %i0, 0x20, %i4
	 2ec:   ba 26 e0 01     sub  %i3, 1, %i5
	 2f0:   80 a6 c0 00     cmp  %i3, %g0
	 2f4:   02 48 00 0b     be  %icc, 320 <afs_getgroups+0x70>
		...
YYYY is the first reference to %i0, and the offset 0x1c is .cr_ngroups .
You can also spot this as a load immediately after two stores, shown as "clr" here.
XXXX is the 2nd reference to %i0 and 0x20 is the offset to .cr_groups ,
which will most likely be an "add" as here.  These offsets don't correspond
to the kernel source I had above; this is from a solaris 8 build, which
apparently doesn't have any of these field names,
	.cr_priv
	.cr_projid
	.cr_zone
	.cr_label

I'm very interested in the offsets at XXXX,YYYY in the above two, as
well as the values in sys/priv_const.h .  Assuming the offsets are different,
I may also be able to suggest a simple fix.

				-Marcus