[OpenAFS] 1.6.0-pre1 ptserver/vlserver dumping core

Derrick Brashear shadow@dementia.org
Sun, 27 Feb 2011 09:05:15 +0000



On Feb 27, 2011, at 4:56 AM, "Ryan C. Underwood" <nemesis-lists@icequake.net=
> wrote:

>=20
> I just upgraded two SMP servers running Linux 2.6.32 to the experimental
> 1.6.0-pre1 Debian packages,

try pre2. pre1 is stale. sorry

> a RW master and a RO backup.  I have
> configured them both for demand attach fileserver using the appropriate
> 'da' prefix server processes.
>=20
> Whenever I vos release, after a while ptserver and vlserver dump core on
> both machines either with signal 11 or signal 6 and bos reloads them.
> The backtraces are very similar in both.  VLLog describes the crash but
> nothing pertinent is in PtLog.
>=20
> The servers are externally NAT'd which wasn't a problem with earlier
> versions which worked fine.  Not sure how to debug this further.
>=20
>  ptserver:
> #0  0xb7792b81 in free () from /lib/i686/cmov/libc.so.6
> #1  0x0807a379 in rxi_CleanupConnection (conn=3D0xb7864ff4) at rx.c:980
> #2  0x0807dbf9 in rxi_CheckCall (call=3D0x9cc33d8) at rx.c:6001
> #3  0x0807e03d in rxi_GrowMTUEvent (event=3D0x0, arg1=3D0x9cc33d8, dummy=3D=
0x0) at rx.c:6233
> #4  0x080876ad in rxevent_RaiseEvents (next=3D0xb7653f6c) at rx_event.c:49=
9
> #5  0x08077b08 in rxi_ListenerProc (rfds=3D<value optimized out>, tnop=3D<=
value optimized out>, newcallp=3D<value optimized out>) at rx_lwp.c:203
> #6  0x08077e5a in rx_ListenerProc (dummy=3D0x0) at rx_lwp.c:335
> #7  0x08088701 in Create_Process_Part2 () at ./lwp.c:805
> #8  0xb775ccdb in makecontext () from /lib/i686/cmov/libc.so.6
> #9  0x0d696910 in ?? ()
> #10 0x08088e88 in LWP_MwaitProcess (event=3D0x9d4ef10) at ./lwp.c:756
> #11 LWP_WaitProcess (event=3D0x9d4ef10) at ./lwp.c:708
> #12 0x080809f0 in rx_GetCall (tno=3D10, cur_service=3D0x9cb3020, socketp=3D=
0xbfa59c8c) at rx.c:2027
> #13 0x08080c3a in rxi_ServerProc (threadID=3D10, newcall=3D0x0, socketp=3D=
0xbfa59c8c) at rx.c:1619
> #14 0x08077dfa in rx_ServerProc (unused=3D0x0) at rx_lwp.c:369
> #15 0x08081228 in rx_StartServer (donateMe=3D1) at rx.c:793
> #16 0x0804a8ba in main (argc=3D1, argv=3D0xbfa5a2c4) at ptserver.c:565
>=20
>  vlserver:
> #0  0xb78da424 in __kernel_vsyscall ()
> #1  0xb779f751 in raise () from /lib/i686/cmov/libc.so.6
> #2  0xb77a2b82 in abort () from /lib/i686/cmov/libc.so.6
> #3  0xb77d618d in ?? () from /lib/i686/cmov/libc.so.6
> #4  0xb77e0281 in ?? () from /lib/i686/cmov/libc.so.6
> #5  0xb77e1ad8 in ?? () from /lib/i686/cmov/libc.so.6
> #6  0xb77e4bbd in free () from /lib/i686/cmov/libc.so.6
> #7  0x080786ed in rxi_CleanupConnection (conn=3D0xb78b6ff4) at rx.c:990
> #8  0x0807bf49 in rxi_CheckCall (call=3D0xa0914c8) at rx.c:6001
> #9  0x0807c38d in rxi_GrowMTUEvent (event=3D0x0, arg1=3D0xa0914c8, dummy=3D=
0x0) at rx.c:6233
> #10 0x08085aad in rxevent_RaiseEvents (next=3D0xb76a5f6c) at rx_event.c:49=
9
> #11 0x08075e58 in rxi_ListenerProc (rfds=3D<value optimized out>, tnop=3D<=
value optimized out>, newcallp=3D<value optimized out>) at rx_lwp.c:203
> #12 0x080761aa in rx_ListenerProc (dummy=3D0x0) at rx_lwp.c:335
> #13 0x08086b01 in Create_Process_Part2 () at ./lwp.c:805
> #14 0xb77aecdb in makecontext () from /lib/i686/cmov/libc.so.6
> #15 0x0d696910 in ?? ()
> #16 0x08087288 in LWP_MwaitProcess (event=3D0xa1635b0) at ./lwp.c:756
> #17 LWP_WaitProcess (event=3D0xa1635b0) at ./lwp.c:708
> #18 0x0807ed40 in rx_GetCall (tno=3D15, cur_service=3D0xa082020, socketp=3D=
0xbfb74e7c) at rx.c:2027
> #19 0x0807ef8a in rxi_ServerProc (threadID=3D15, newcall=3D0x0, socketp=3D=
0xbfb74e7c) at rx.c:1619
> #20 0x0807614a in rx_ServerProc (unused=3D0x0) at rx_lwp.c:369
> #21 0x0807f578 in rx_StartServer (donateMe=3D1) at rx.c:793
> #22 0x0804a8c6 in main (argc=3D1, argv=3D0xbfb75714) at vlserver.c:407
>=20
> Here is my bos config:
> restrictmode 0
> restarttime 11 0 4 0 0
> checkbintime 3 0 5 0 0
> bnode simple ptserver 1
> parm /usr/lib/openafs/ptserver
> end
> bnode simple vlserver 1
> parm /usr/lib/openafs/vlserver
> end
> bnode cron userbackup 1
> parm /usr/bin/nice /afs/icequake.net/pub/adm/backup_afs.sh -d -u
> parm 3:00
> end
> bnode dafs dafs 1
> parm /usr/afs/bin/dafileserver -p 123 -pctspare 20 -L -busyat 50 -rxpck 20=
00 -rxbind -cb 4000000 -vattachpar 128 -vlruthresh 1440 -vlrumax 8 -vhashsiz=
e 11
> parm /usr/afs/bin/davolserver -p 64 -log -rxbind
> parm /usr/afs/bin/salvageserver
> parm /usr/afs/bin/dasalvager -parallel all32
> end
>=20
>  BosLog:
> Sat Feb 26 19:28:11 2011: Core limits now -1 -1
> Sat Feb 26 19:28:11 2011: Server directory access is okay
> Sat Feb 26 19:31:29 2011: vlserver exited on signal 6 (core dumped)
> Sat Feb 26 19:36:33 2011: ptserver exited on signal 6 (core dumped)
> Sat Feb 26 19:50:57 2011: vlserver exited on signal 6 (core dumped)
> Sat Feb 26 19:51:49 2011: ptserver exited on signal 6 (core dumped)
> Sat Feb 26 20:16:07 2011: ptserver exited on signal 6 (core dumped)
> Sat Feb 26 20:18:33 2011: vlserver exited on signal 11 (core dumped)
> Sat Feb 26 20:31:22 2011: ptserver exited on signal 6 (core dumped)
> Sat Feb 26 20:39:44 2011: ptserver exited on signal 6 (core dumped)
> Sat Feb 26 20:54:11 2011: vlserver exited on signal 11 (core dumped)
> Sat Feb 26 20:54:59 2011: ptserver exited on signal 11 (core dumped)
> Sat Feb 26 20:58:58 2011: vlserver exited on signal 6 (core dumped)
> Sat Feb 26 21:13:19 2011: ptserver exited on signal 11 (core dumped)
> Sat Feb 26 21:21:41 2011: ptserver exited on signal 6 (core dumped)
> Sat Feb 26 21:26:58 2011: ptserver exited on signal 6 (core dumped)
> Sat Feb 26 21:48:20 2011: ptserver exited on signal 11 (core dumped)
> Sat Feb 26 21:53:37 2011: ptserver exited on signal 6 (core dumped)
> Sat Feb 26 21:58:54 2011: ptserver exited on signal 6 (core dumped)
> Sat Feb 26 22:17:10 2011: ptserver exited on signal 6 (core dumped)
> Sat Feb 26 22:22:27 2011: ptserver exited on signal 6 (core dumped)
> Sat Feb 26 22:27:44 2011: ptserver exited on signal 6 (core dumped)
> Sat Feb 26 22:43:00 2011: ptserver exited on signal 11 (core dumped)
> Sat Feb 26 22:48:42 2011: vlserver exited on signal 6 (core dumped)
>=20
>  PtLog:
> Sat Feb 26 22:27:44 2011 Using 10.0.1.232 as my primary address
> Sat Feb 26 22:27:58 2011 Starting AFS ptserver 1.1 (/usr/lib/openafs/ptser=
ver)
> Sat Feb 26 22:34:49 2011 ubik: A Remote Server has addresses: Sat Feb 26 2=
2:34:49 2011 10.0.1.230 Sat Feb 26 22:34:49 2011 65.38.17.159 Sat Feb 26 22:=
34:49 2011=20
>=20
>  VLLog:
> Sat Feb 26 20:58:58 2011 Using 10.0.1.232 as my primary address
> Sat Feb 26 20:59:12 2011 Starting AFS vlserver 4 (/usr/lib/openafs/vlserve=
r)
> *** glibc detected *** /usr/lib/openafs/vlserver: corrupted double-linked l=
ist: 0x086d8988 ***
> =3D=3D=3D=3D=3D=3D=3D Backtrace: =3D=3D=3D=3D=3D=3D=3D=3D=3D
> /lib/i686/cmov/libc.so.6(+0x6b281)[0xb7759281]
> /lib/i686/cmov/libc.so.6(+0x6cb31)[0xb775ab31]
> /lib/i686/cmov/libc.so.6(cfree+0x6d)[0xb775dbbd]
> /usr/lib/openafs/vlserver[0x80786ed]
> /usr/lib/openafs/vlserver[0x807bf49]
> /usr/lib/openafs/vlserver[0x807c38d]
> /usr/lib/openafs/vlserver[0x8085aad]
> /usr/lib/openafs/vlserver[0x8075e58]
> /usr/lib/openafs/vlserver[0x80761aa]
> /usr/lib/openafs/vlserver[0x8086b01]
> /lib/i686/cmov/libc.so.6(makecontext+0x4b)[0xb7727cdb]
> /usr/lib/openafs/vlserver[0x8087288]
> /usr/lib/openafs/vlserver[0x807ed40]
> /usr/lib/openafs/vlserver[0x807ef8a]
> /usr/lib/openafs/vlserver[0x807614a]
> /usr/lib/openafs/vlserver[0x807f578]
> /usr/lib/openafs/vlserver[0x804a8c6]
> /lib/i686/cmov/libc.so.6(__libc_start_main+0xe6)[0xb7704c76]
> /usr/lib/openafs/vlserver[0x804a301]
> =3D=3D=3D=3D=3D=3D=3D Memory map: =3D=3D=3D=3D=3D=3D=3D=3D
> 08048000-0809a000 r-xp 00000000 fe:01 395449     /usr/lib/openafs/vlserver=

> 0809a000-0809b000 rw-p 00052000 fe:01 395449     /usr/lib/openafs/vlserver=

> 0809b000-080f3000 rw-p 00000000 00:00 0
> 085ee000-086f9000 rw-p 00000000 00:00 0          [heap]
> b6f00000-b6f21000 rw-p 00000000 00:00 0
> b6f21000-b7000000 ---p 00000000 00:00 0
> b7031000-b704e000 r-xp 00000000 fe:01 260908     /lib/libgcc_s.so.1
> b704e000-b704f000 rw-p 0001c000 fe:01 260908     /lib/libgcc_s.so.1
> b7058000-b76d8000 rw-p 00000000 00:00 0
> b76d8000-b76e2000 r-xp 00000000 fe:01 260884     /lib/i686/cmov/libnss_fil=
es-2.11.2.so
> b76e2000-b76e3000 r--p 00009000 fe:01 260884     /lib/i686/cmov/libnss_fil=
es-2.11.2.so
> b76e3000-b76e4000 rw-p 0000a000 fe:01 260884     /lib/i686/cmov/libnss_fil=
es-2.11.2.so
> b76ed000-b76ee000 rw-p 00000000 00:00 0
> b76ee000-b782e000 r-xp 00000000 fe:01 260891     /lib/i686/cmov/libc-2.11.=
2.so
> b782e000-b7830000 r--p 0013f000 fe:01 260891     /lib/i686/cmov/libc-2.11.=
2.so
> b7830000-b7831000 rw-p 00141000 fe:01 260891     /lib/i686/cmov/libc-2.11.=
2.so
> b7831000-b7834000 rw-p 00000000 00:00 0
> b7834000-b7844000 r-xp 00000000 fe:01 260851     /lib/i686/cmov/libresolv-=
2.11.2.so
> b7844000-b7845000 r--p 00010000 fe:01 260851     /lib/i686/cmov/libresolv-=
2.11.2.so
> b7845000-b7846000 rw-p 00011000 fe:01 260851     /lib/i686/cmov/libresolv-=
2.11.2.so
> b7846000-b784a000 rw-p 00000000 00:00 0
> b784a000-b784b000 rw-p 00000000 00:00 0
> b784b000-b784f000 r-xp 00000000 fe:01 260868     /lib/i686/cmov/libnss_dns=
-2.11.2.so
> b784f000-b7850000 r--p 00004000 fe:01 260868     /lib/i686/cmov/libnss_dns=
-2.11.2.so
> b7850000-b7851000 rw-p 00005000 fe:01 260868     /lib/i686/cmov/libnss_dns=
-2.11.2.so
> b7851000-b7853000 rw-p 00000000 00:00 0
> b7853000-b7854000 r-xp 00000000 00:00 0          [vdso]
> b7854000-b786f000 r-xp 00000000 fe:01 270748     /lib/ld-2.11.2.so
> b786f000-b7870000 r--p 0001a000 fe:01 270748     /lib/ld-2.11.2.so
> b7870000-b7871000 rw-p 0001b000 fe:01 270748     /lib/ld-2.11.2.so
> bfe89000-bfe9e000 rw-p 00000000 00:00 0          [stack]
> @(#) OpenAFS 1.6.0~pre1-1-debian built  2010-12-29
>=20
> --=20
> Ryan C. Underwood, <nemesis@icequake.net>
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info