[OpenAFS] 1.6.0-pre1 ptserver/vlserver dumping core

Ryan C. Underwood nemesis@icequake.net
Sat, 26 Feb 2011 22:54:59 -0600


I just upgraded two SMP servers running Linux 2.6.32 to the experimental
1.6.0-pre1 Debian packages, a RW master and a RO backup.  I have
configured them both for demand attach fileserver using the appropriate
'da' prefix server processes.

Whenever I vos release, after a while ptserver and vlserver dump core on
both machines either with signal 11 or signal 6 and bos reloads them.
The backtraces are very similar in both.  VLLog describes the crash but
nothing pertinent is in PtLog.

The servers are externally NAT'd which wasn't a problem with earlier
versions which worked fine.  Not sure how to debug this further.

  ptserver:
#0  0xb7792b81 in free () from /lib/i686/cmov/libc.so.6
#1  0x0807a379 in rxi_CleanupConnection (conn=0xb7864ff4) at rx.c:980
#2  0x0807dbf9 in rxi_CheckCall (call=0x9cc33d8) at rx.c:6001
#3  0x0807e03d in rxi_GrowMTUEvent (event=0x0, arg1=0x9cc33d8, dummy=0x0) at rx.c:6233
#4  0x080876ad in rxevent_RaiseEvents (next=0xb7653f6c) at rx_event.c:499
#5  0x08077b08 in rxi_ListenerProc (rfds=<value optimized out>, tnop=<value optimized out>, newcallp=<value optimized out>) at rx_lwp.c:203
#6  0x08077e5a in rx_ListenerProc (dummy=0x0) at rx_lwp.c:335
#7  0x08088701 in Create_Process_Part2 () at ./lwp.c:805
#8  0xb775ccdb in makecontext () from /lib/i686/cmov/libc.so.6
#9  0x0d696910 in ?? ()
#10 0x08088e88 in LWP_MwaitProcess (event=0x9d4ef10) at ./lwp.c:756
#11 LWP_WaitProcess (event=0x9d4ef10) at ./lwp.c:708
#12 0x080809f0 in rx_GetCall (tno=10, cur_service=0x9cb3020, socketp=0xbfa59c8c) at rx.c:2027
#13 0x08080c3a in rxi_ServerProc (threadID=10, newcall=0x0, socketp=0xbfa59c8c) at rx.c:1619
#14 0x08077dfa in rx_ServerProc (unused=0x0) at rx_lwp.c:369
#15 0x08081228 in rx_StartServer (donateMe=1) at rx.c:793
#16 0x0804a8ba in main (argc=1, argv=0xbfa5a2c4) at ptserver.c:565

  vlserver:
#0  0xb78da424 in __kernel_vsyscall ()
#1  0xb779f751 in raise () from /lib/i686/cmov/libc.so.6
#2  0xb77a2b82 in abort () from /lib/i686/cmov/libc.so.6
#3  0xb77d618d in ?? () from /lib/i686/cmov/libc.so.6
#4  0xb77e0281 in ?? () from /lib/i686/cmov/libc.so.6
#5  0xb77e1ad8 in ?? () from /lib/i686/cmov/libc.so.6
#6  0xb77e4bbd in free () from /lib/i686/cmov/libc.so.6
#7  0x080786ed in rxi_CleanupConnection (conn=0xb78b6ff4) at rx.c:990
#8  0x0807bf49 in rxi_CheckCall (call=0xa0914c8) at rx.c:6001
#9  0x0807c38d in rxi_GrowMTUEvent (event=0x0, arg1=0xa0914c8, dummy=0x0) at rx.c:6233
#10 0x08085aad in rxevent_RaiseEvents (next=0xb76a5f6c) at rx_event.c:499
#11 0x08075e58 in rxi_ListenerProc (rfds=<value optimized out>, tnop=<value optimized out>, newcallp=<value optimized out>) at rx_lwp.c:203
#12 0x080761aa in rx_ListenerProc (dummy=0x0) at rx_lwp.c:335
#13 0x08086b01 in Create_Process_Part2 () at ./lwp.c:805
#14 0xb77aecdb in makecontext () from /lib/i686/cmov/libc.so.6
#15 0x0d696910 in ?? ()
#16 0x08087288 in LWP_MwaitProcess (event=0xa1635b0) at ./lwp.c:756
#17 LWP_WaitProcess (event=0xa1635b0) at ./lwp.c:708
#18 0x0807ed40 in rx_GetCall (tno=15, cur_service=0xa082020, socketp=0xbfb74e7c) at rx.c:2027
#19 0x0807ef8a in rxi_ServerProc (threadID=15, newcall=0x0, socketp=0xbfb74e7c) at rx.c:1619
#20 0x0807614a in rx_ServerProc (unused=0x0) at rx_lwp.c:369
#21 0x0807f578 in rx_StartServer (donateMe=1) at rx.c:793
#22 0x0804a8c6 in main (argc=1, argv=0xbfb75714) at vlserver.c:407

Here is my bos config:
restrictmode 0
restarttime 11 0 4 0 0
checkbintime 3 0 5 0 0
bnode simple ptserver 1
parm /usr/lib/openafs/ptserver
end
bnode simple vlserver 1
parm /usr/lib/openafs/vlserver
end
bnode cron userbackup 1
parm /usr/bin/nice /afs/icequake.net/pub/adm/backup_afs.sh -d -u
parm 3:00
end
bnode dafs dafs 1
parm /usr/afs/bin/dafileserver -p 123 -pctspare 20 -L -busyat 50 -rxpck 2000 -rxbind -cb 4000000 -vattachpar 128 -vlruthresh 1440 -vlrumax 8 -vhashsize 11
parm /usr/afs/bin/davolserver -p 64 -log -rxbind
parm /usr/afs/bin/salvageserver
parm /usr/afs/bin/dasalvager -parallel all32
end

  BosLog:
Sat Feb 26 19:28:11 2011: Core limits now -1 -1
Sat Feb 26 19:28:11 2011: Server directory access is okay
Sat Feb 26 19:31:29 2011: vlserver exited on signal 6 (core dumped)
Sat Feb 26 19:36:33 2011: ptserver exited on signal 6 (core dumped)
Sat Feb 26 19:50:57 2011: vlserver exited on signal 6 (core dumped)
Sat Feb 26 19:51:49 2011: ptserver exited on signal 6 (core dumped)
Sat Feb 26 20:16:07 2011: ptserver exited on signal 6 (core dumped)
Sat Feb 26 20:18:33 2011: vlserver exited on signal 11 (core dumped)
Sat Feb 26 20:31:22 2011: ptserver exited on signal 6 (core dumped)
Sat Feb 26 20:39:44 2011: ptserver exited on signal 6 (core dumped)
Sat Feb 26 20:54:11 2011: vlserver exited on signal 11 (core dumped)
Sat Feb 26 20:54:59 2011: ptserver exited on signal 11 (core dumped)
Sat Feb 26 20:58:58 2011: vlserver exited on signal 6 (core dumped)
Sat Feb 26 21:13:19 2011: ptserver exited on signal 11 (core dumped)
Sat Feb 26 21:21:41 2011: ptserver exited on signal 6 (core dumped)
Sat Feb 26 21:26:58 2011: ptserver exited on signal 6 (core dumped)
Sat Feb 26 21:48:20 2011: ptserver exited on signal 11 (core dumped)
Sat Feb 26 21:53:37 2011: ptserver exited on signal 6 (core dumped)
Sat Feb 26 21:58:54 2011: ptserver exited on signal 6 (core dumped)
Sat Feb 26 22:17:10 2011: ptserver exited on signal 6 (core dumped)
Sat Feb 26 22:22:27 2011: ptserver exited on signal 6 (core dumped)
Sat Feb 26 22:27:44 2011: ptserver exited on signal 6 (core dumped)
Sat Feb 26 22:43:00 2011: ptserver exited on signal 11 (core dumped)
Sat Feb 26 22:48:42 2011: vlserver exited on signal 6 (core dumped)

  PtLog:
Sat Feb 26 22:27:44 2011 Using 10.0.1.232 as my primary address
Sat Feb 26 22:27:58 2011 Starting AFS ptserver 1.1 (/usr/lib/openafs/ptserver)
Sat Feb 26 22:34:49 2011 ubik: A Remote Server has addresses: Sat Feb 26 22:34:49 2011 10.0.1.230 Sat Feb 26 22:34:49 2011 65.38.17.159 Sat Feb 26 22:34:49 2011 

  VLLog:
Sat Feb 26 20:58:58 2011 Using 10.0.1.232 as my primary address
Sat Feb 26 20:59:12 2011 Starting AFS vlserver 4 (/usr/lib/openafs/vlserver)
*** glibc detected *** /usr/lib/openafs/vlserver: corrupted double-linked list: 0x086d8988 ***
======= Backtrace: =========
/lib/i686/cmov/libc.so.6(+0x6b281)[0xb7759281]
/lib/i686/cmov/libc.so.6(+0x6cb31)[0xb775ab31]
/lib/i686/cmov/libc.so.6(cfree+0x6d)[0xb775dbbd]
/usr/lib/openafs/vlserver[0x80786ed]
/usr/lib/openafs/vlserver[0x807bf49]
/usr/lib/openafs/vlserver[0x807c38d]
/usr/lib/openafs/vlserver[0x8085aad]
/usr/lib/openafs/vlserver[0x8075e58]
/usr/lib/openafs/vlserver[0x80761aa]
/usr/lib/openafs/vlserver[0x8086b01]
/lib/i686/cmov/libc.so.6(makecontext+0x4b)[0xb7727cdb]
/usr/lib/openafs/vlserver[0x8087288]
/usr/lib/openafs/vlserver[0x807ed40]
/usr/lib/openafs/vlserver[0x807ef8a]
/usr/lib/openafs/vlserver[0x807614a]
/usr/lib/openafs/vlserver[0x807f578]
/usr/lib/openafs/vlserver[0x804a8c6]
/lib/i686/cmov/libc.so.6(__libc_start_main+0xe6)[0xb7704c76]
/usr/lib/openafs/vlserver[0x804a301]
======= Memory map: ========
08048000-0809a000 r-xp 00000000 fe:01 395449     /usr/lib/openafs/vlserver
0809a000-0809b000 rw-p 00052000 fe:01 395449     /usr/lib/openafs/vlserver
0809b000-080f3000 rw-p 00000000 00:00 0
085ee000-086f9000 rw-p 00000000 00:00 0          [heap]
b6f00000-b6f21000 rw-p 00000000 00:00 0
b6f21000-b7000000 ---p 00000000 00:00 0
b7031000-b704e000 r-xp 00000000 fe:01 260908     /lib/libgcc_s.so.1
b704e000-b704f000 rw-p 0001c000 fe:01 260908     /lib/libgcc_s.so.1
b7058000-b76d8000 rw-p 00000000 00:00 0
b76d8000-b76e2000 r-xp 00000000 fe:01 260884     /lib/i686/cmov/libnss_files-2.11.2.so
b76e2000-b76e3000 r--p 00009000 fe:01 260884     /lib/i686/cmov/libnss_files-2.11.2.so
b76e3000-b76e4000 rw-p 0000a000 fe:01 260884     /lib/i686/cmov/libnss_files-2.11.2.so
b76ed000-b76ee000 rw-p 00000000 00:00 0
b76ee000-b782e000 r-xp 00000000 fe:01 260891     /lib/i686/cmov/libc-2.11.2.so
b782e000-b7830000 r--p 0013f000 fe:01 260891     /lib/i686/cmov/libc-2.11.2.so
b7830000-b7831000 rw-p 00141000 fe:01 260891     /lib/i686/cmov/libc-2.11.2.so
b7831000-b7834000 rw-p 00000000 00:00 0
b7834000-b7844000 r-xp 00000000 fe:01 260851     /lib/i686/cmov/libresolv-2.11.2.so
b7844000-b7845000 r--p 00010000 fe:01 260851     /lib/i686/cmov/libresolv-2.11.2.so
b7845000-b7846000 rw-p 00011000 fe:01 260851     /lib/i686/cmov/libresolv-2.11.2.so
b7846000-b784a000 rw-p 00000000 00:00 0
b784a000-b784b000 rw-p 00000000 00:00 0
b784b000-b784f000 r-xp 00000000 fe:01 260868     /lib/i686/cmov/libnss_dns-2.11.2.so
b784f000-b7850000 r--p 00004000 fe:01 260868     /lib/i686/cmov/libnss_dns-2.11.2.so
b7850000-b7851000 rw-p 00005000 fe:01 260868     /lib/i686/cmov/libnss_dns-2.11.2.so
b7851000-b7853000 rw-p 00000000 00:00 0
b7853000-b7854000 r-xp 00000000 00:00 0          [vdso]
b7854000-b786f000 r-xp 00000000 fe:01 270748     /lib/ld-2.11.2.so
b786f000-b7870000 r--p 0001a000 fe:01 270748     /lib/ld-2.11.2.so
b7870000-b7871000 rw-p 0001b000 fe:01 270748     /lib/ld-2.11.2.so
bfe89000-bfe9e000 rw-p 00000000 00:00 0          [stack]
@(#) OpenAFS 1.6.0~pre1-1-debian built  2010-12-29

-- 
Ryan C. Underwood, <nemesis@icequake.net>