[OpenAFS] Locking issues, Solaris & AFS, kernel panic stated to be due to AFS vs Solaris thread locking order

Fri, 3 Jun 2005 15:44:21 -0600

We have had a panic/core on A Sun Solaris box -- the Sun kernel engineer =
who
examined the core determined that there was lock contention for a page =
of
memory -- and that the kernel and AFS (IBM AFS 3.6 2.53) use different
locking order and that this results in lock contention which is =
subsequently
detected by the kernel which panics as a result.

Summary and details below.

Question:  Is there any current issue with lock order in OpenAFS that =
anyone
is aware of, or any changes that have been made to OpenAFS to address =
the
issue?

-------------- SUMMARY

1.  The spontaneous reboots recently seen on systemX are caused by an
interaction between the AFS client and the Solaris operating system.=20
(This also seems to be a likely explanation for the systemY and=20
directory server reboots, as well.)

2.  The AFS client locks thread resources in the reverse order of that
employed by the operating system.  This can (and in this case did)
result in a condition where each of two processes obtains one half of
the locks needed on a resource and cannot get the other half because it
is already locked by the other process.  Kernel panic, or system hang,
eventually results.

3.  Per our conversation with Sun Microsystems, the condition has been =
seen
before and is known to Sun.  Also, the reboots or hangs are expected to
be quite erratic in timing, and to some degree dependent on the other
process that are used on a given system.

4.  Last night, we upgraded systemX to the very latest Transarc AFS =
client
available to our production environment, IBM 3.6 2.53.  We do not know =
if
this condition has been fixed in
that client, but it was the most we could do at the time.  UPDATE: We =
have
since had another=20
panic and have sent a second core to Sun to see if the lock contention =
is
the cause of this.

5.  Kim Kimball will be checking with Transarc and with the open source
AFS group to see if either has a client that has fixed the condition.=20
If not, bug reports will be filed.  If so, we need to obtain
an appropriate client.

6.  systemX and, I think, the institutional directory servers, are
absolutely
dependent on AFS at this time to support the Kerberos plug-ins used for
authenticating with the AFS password.  So removing AFS is not an option
at this time.

7.  We can and will consider removing AFS from related systems.
However, there are dependencies there as well that will have to be
accounted for, and the remaining life of these systems is measured in
months.

DETAILS FROM SUN KERNEL ENGINEER ----
>=20
>=20
> To whom:
>=20
> Here is my analysis of the core file from eis-msg-056. My comments are =

> on lines that begin with >>>.
>=20
>=20
>=20
> core file:      /cores/64603274/vmcore.0
> release:        5.8 (64-bit)
> version:        Generic_108528-27
> machine:        sun4u
> node name:      eis-msg-056
> hw_provider:    Sun_Microsystems
> system type:    SUNW,Sun-Fire-280R
> hostid:         8329d379
> time of crash:  Thu Jun  2 14:28:44 MDT 2005
> age of system:  2 minutes 32.75 seconds
> panic CPU:      1 (2 CPUs, 2G memory)
> panic string:   Deadlock: cycle in blocking chain
>=20
> sanity checks: settings...vmem...cpu...sysent...
> WARNING: unknown module afs seen 6 times in sysent table
> clock...misc...done
>=20
>  >>> Below is the thread that actually panicked the system:
>=20
> SolarisCAT(vmcore.0)> panic
> panic on cpu 1
> panic string:   Deadlock: cycle in blocking chain
> =3D=3D=3D=3D panic user (LWP_SYS) thread: 0x30006221220  pid: 348  =
bound to CPU:=20
> 1 =3D=3D=3D=3D
> cmd: ./ns-slapd -D /opt/netscape/server5/slapd-dir -i=20
> /opt/netscape/server5/slapd-di
> t_wchan: 0x30000ace880  sobj: mutex     owner: 0x30006220720=20
> (recursive mutex loop)
> top mutex owner: 0x30006221220  mutex type: unknown
> t_procp: 0x30004570010  p_as: 0x3000162c988  hat: 0x300000595e8  cnum:
0x896
>    size: 52.0M  rss: 37M
> t_stk: 0x2a100671af0  sp: 0x10423351  t_stkbase: 0x2a10066e000
> t_pri: 58(TS)  t_epri: 60  pctcpu: 0.439721  t_lwp: 0x300061f9890=20
> machpcb: 0x2a100671af0
> last CPU: 1
> idle: 100 ticks (1.00 seconds)
> start: Thu Jun  2 14:28:34 2005
> age: 10 seconds (10 seconds)
> stime: 15275 (0 seconds later)
> syscall: #65 sys#65() (sysent: genunix:loadable_syscall+0x0)
> tstate: TS_SLEEP - awaiting an event
> tflg:   T_PANIC - thread initiated a system panic
> tpflg:  none set
> tsched: TS_LOAD - thread is in memory
>          TS_DONT_SWAP - thread/LWP should not be swapped
> pflag:  SLOAD - in core
>          SULOAD - u-block in core
>          ASLWP - process uses an aslwp for async signals
>                - also overloaded to mean it is a MT process
>          SPRLOCK - process locked by /proc
>          NOCD - new creds from VSxID, do not coredump
>=20
> pc: 0x10047010  unix:panicsys+0x44:   call      unix:setjmp
>=20
> -- on panic_stack --
>     0x10423b50 unix:panicsys+0x44 (0x10423d20, 0x2a100671258,=20
> 0x10146550, 0x78002000, 0x736c6963, 0xb)
>     0x10423c70 unix:vpanic+0xcc (0x10146550, 0x2a100671258, 0x3c, 0x0, =

> 0x0, 0x30005f88400)
> -- switch to user (LWP_SYS) thread's stack --
> 0x2a100671120 unix:panic+0x1c (0x10146550, 0x10410128, 0x3c,=20
> 0x30006220720, 0x0, 0x0)
> 0x2a1006711d0 genunix:turnstile_block+0x360 (0x30006221220, 0x0,=20
> 0x10410128, 0x30004570010, 0x0, 0x30000ace880)
> 0x2a1006712a0 unix:mutex_vector_enter+0x374 (0x30000ace880, =
0x10415ea0,=20
> 0x104240d0, 0x0, 0x1, 0x0)
>                unix:mutex_enter - frame recycled
> 0x2a100671360 genunix:crset+0x24 (0x30004570010, 0x30000a4e668, =
0x3f00,=20
> 0x30006221220, 0x16, 0x30006001188)
> 0x2a100671410 afs:afs_setgroups+0x160 (0x2a100671890, 0x2,=20
> 0x30001ed7c00, 0x0, 0x3bab, 0x781a0e50)
> 0x2a100671510 afs:setpag+0x178 (0x2a100671890, 0x4100007b,=20
> 0x2a1006716b8, 0x0, 0xc, 0xff2ce49c)
> 0x2a100671600 afs:AddPag+0x4c (0x4100007b, 0x2a100671890, 0x0,=20
> 0xfe1910b8, 0x736c6963, 0x726f6f74)
> 0x2a1006716d0 afs:afs_setpag+0x138 (0x2a100671890, 0x0, 0x0, 0x0, 0x0, =

> 0x1f0182c)
> 0x2a1006717a0 afs:Afs_syscall+0x14c (0x300061f9948, 0x2a100671978, =
0x1,=20
> 0x1, 0x40, 0x40)
> 0x2a1006718c0 genunix:syscall_ap+0x6c (0x104347f8, 0x2a100671ba0,=20
> 0x300061f9890, 0xfed952e4, 0xc, 0x30000078680)
> 0x2a100671980 genunix:loadable_syscall+0x90 (0x30000078680, =
0x104347f8,=20
> 0x1400, 0x4, 0xc, 0x0)
> 0x2a100671a40 genunix:indir+0xa4 (0x104347f8, 0x15, 0xfe190f1c,=20
> 0xfed8e000, 0xfed952e4, 0xc)
> 0x2a100671af0 unix:syscall_trap32+0xa8 (0x41, 0x15, 0xfe190f1c,=20
> 0xfed8e000, 0xfed952e4, 0xc)
> -- switch to user thread's user stack --
>=20
> SolarisCAT(vmcore.0)> modinfo -p afs
>   id flags        modctl      textaddr     size cnt name
> 113 LIN   0x3000226e720    0x780d8000  0xc418b   1 afs (afs syscall=20
> interface)
>=20
>  >>> The above thread is running in the afs kernel module when it
>  >>> attempts to grab a lock. That lock is held by another thread.
>=20
>=20
>  >>> Here is the thread that owns the lock the afs thread is
>  >>> trying to get:
>=20
> SolarisCAT(vmcore.0)> thread 0x30006220720
> =3D=3D=3D=3D user (LWP_SYS) thread: 0x30006220720  pid: 427 =
=3D=3D=3D=3D
> cmd: ps -e
> t_wchan: 0x30004570028  sobj: mutex     owner: 0x30006221220=20
> (recursive mutex loop)
> top mutex owner: 0x30006220720  mutex type: unknown
> t_procp: 0x3000341e008  p_as: 0x3000162d988  hat: 0x30000059498  cnum:
0x8bd
>    size: 1.91M  rss: 1.17M
> t_stk: 0x2a100697af0  sp: 0x2a100696ce1  t_stkbase: 0x2a100694000
> t_pri: 60(TS)  pctcpu: 0.097489  t_lwp: 0x300061f8ad0  machpcb:=20
> 0x2a100697af0
> last CPU: 0
> idle: 91 ticks (0.91 seconds)
> start: Thu Jun  2 14:28:43 2005
> age: 1 seconds (1 seconds)
> stime: 15184 (0.91 seconds earlier)
> syscall: #3 read(0x222d8, 0xffbefbec, 0xffbefbf8) (sysent:
genunix:read+0x0)
> tstate: TS_SLEEP - awaiting an event
> tflg:   none set
> tpflg:  TP_TWAIT - wait to be freed by lwp_wait
> tsched: TS_LOAD - thread is in memory
>          TS_DONT_SWAP - thread/LWP should not be swapped
> pflag:  SLOAD - in core
>          SULOAD - u-block in core
>=20
> pc: 0x10119f30  genunix:turnstile_block+0x60c:   call   unix:swtch
>=20
> -- on user (LWP_SYS) thread's stack --
> 0x2a1006974e0 genunix:turnstile_block+0x60c (0x30006221220, 0x0,=20
> 0x10410128, 0x3000341e008, 0x0, 0x30004570028)
> 0x2a1006975b0 unix:mutex_vector_enter+0x374 (0x30004570028, =
0x10415ea0,=20
> 0x104240d0, 0x0, 0x1, 0x0)
>                unix:mutex_enter - frame recycled
> 0x2a100697670 procfs:prgetpsinfo+0x5c (0x30005f32120, 0x2a1006977d0,=20
> 0x30004570010, 0x2a1006977d0, 0x30004570010, 0x300016f8f10)
> 0x2a100697720 procfs:pr_read_psinfo+0x3c (0x30004570010, =
0x2a100697a28,=20
> 0x300061eae00, 0x30000a4ff28, 0x30006181048, 0x30006180fa8)
>                procfs:prread - frame recycled
> 0x2a100697970 genunix:read+0x25c (0x0, 0x0, 0x2001, 0x3000607a038, =
0x4,=20
> 0x1a0)
> 0x2a100697af0 unix:syscall_trap+0x88 (0x4, 0xffffffff7ffff628, 0x1a0,=20
> 0xffffffff7f2b2f58, 0x81010100, 0xff00)
> -- switch to user thread's user stack --
>=20
> SolarisCAT(vmcore.0)>
>=20
>  >>> This thread is trying to get a lock itself. That lock is held by
>  >>> the afs thread that panicked the system. This is a deadlock. Each
>  >>> thread has a lock yet wants the lock held by the other thread.
>=20
>  >>> Past experience with this problem indicates that this is a =
commonly
>  >>> seen problem where both afs and (in this case) ps need 2 locks on
>  >>> 2 different resources. By convention, ps grabs the locks in a
>  >>> certain order. afs does not follow that convention and grabs the
>  >>> locks in the reverse order. The fix is to use a version of afs =
that
>  >>> adheres to the locking convention that ps and other unix =
processes
>  >>> use.
--=20