[OpenAFS-devel] fileserver 1.2.10-rc2 crash on Irix 6.5.20

Martin MOKREJŠ mmokrejs@natur.cuni.cz
Tue, 22 Jul 2003 22:39:03 +0200 (CEST)


Hi,
  while inspecting logs on my testing machine, I found at 04:00 the
fileserver dumped core, I guess on restart:

>  0 _BSD_getime(0x7fff2c30, 0x0, 0x100ff3a8, 0x0, 0x2, 0xfb36d70, 0xfa517a8, 0xfa51770) ["/xlv47/6.5.20f/work/irix/lib/libc/libc_n32_M3/sys/BSD_getime.s":12, 0xfaf6d08]
   1 _gettimeofday(0x7fff2c30, 0x0, 0x100ff3a8, 0x0, 0x2, 0xfb36d70, 0xfa517a8, 0xfa51770) ["/xlv47/6.5.20f/work/irix/lib/libc/libc_n32_M3/sys/gettimeday.c":29, 0xfaf8194]
   2 rx_NewCall(conn = 0x104434b0) ["/scratch2/openafs-1.2.10-rc2/src/rx/rx.c":1016, 0x10093990]
   3 PR_GetCPS(z_conn = 0x104434b0, id = -101, elist = 0x100ff3a8, over = 0x7fff2e60) ["/scratch2/openafs-1.2.10-rc2/src/ptserver/ptint.cs.c":335, 0x1007dc10]
   4 ubik_Call(aproc = 0x1007dbe0, aclient = 0x104437c0, aflags = 0, p1 = -101, p2 = 269480872, p3 = 2147429984, p4 = 2, p5 = 263417200, p6 = 262477736, p7 = 262477680, p8 = 268538408, p9 = 269504116, p10 = -101, p11 = 269480872, p12 = 0, p13 = 268526332, p14 = 268526416, p15 = 269504116, p16 = 5376) ["/scratch2/openafs-1.2.10-rc2/src/ubik/ubikclient.c":456, 0x100859b0]
   5 pr_GetCPS(id = -101, CPS = 0x100ff3a8) ["/scratch2/openafs-1.2.10-rc2/src/ptserver/ptuser.c":422, 0x1007bb64]
   6 InitPR() ["/scratch2/openafs-1.2.10-rc2/src/viced/viced.c":1389, 0x10019220]
   7 main(argc = 1, argv = 0x7fff2fa4) ["/scratch2/openafs-1.2.10-rc2/src/viced/viced.c":547, 0x10016348]
   8 __start() ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M3/csu/crt1text.s":177, 0x10015568]
(dbx)


The logs say:

Sat Jul 19 05:25:42 2003 XFS/EFS File server starting
Sat Jul 19 05:28:10 2003 VL_RegisterAddrs rpc failed; will retry periodically (code=-1, err=2)
Sat Jul 19 05:30:30 2003 Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.
Sat Jul 19 05:32:10 2003 Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.
Sat Jul 19 05:33:50 2003 Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.
Sat Jul 19 05:35:30 2003 Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.
Sat Jul 19 05:37:10 2003 Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.
[...]
Sun Jul 20 03:56:21 2003 Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.
Sun Jul 20 03:58:01 2003 Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.
Sun Jul 20 03:59:42 2003 Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.
[...]
Sun Jul 20 04:00:52 2003 XFS/EFS File server starting
Sun Jul 20 04:03:12 2003 VL_RegisterAddrs rpc failed; will retry periodically (code=5376, err=2)
Sun Jul 20 04:05:32 2003 Couldn't get CPS for AnyUser, will try again in 30 seconds; code=-1.
Sun Jul 20 04:07:12 2003 Couldn't get CPS for AnyUser, will try again in 30 seconds; code=-1.
Sun Jul 20 04:08:52 2003 Couldn't get CPS for AnyUser, will try again in 30 seconds; code=-1.
[...]
Mon Jul 21 09:17:12 2003 Couldn't get CPS for AnyUser, will try again in 30 seconds; code=-1.
Mon Jul 21 09:18:52 2003 Couldn't get CPS for AnyUser, will try again in 30 seconds; code=-1.
Mon Jul 21 09:20:32 2003 Couldn't get CPS for AnyUser, will try again in 30 seconds; code=-1.
Mon Jul 21 09:22:12 2003 Couldn't get CPS for AnyUser, will try again in 30 seconds; code=-1.
Mon Jul 21 09:23:52 2003 Couldn't get CPS for AnyUser, will try again in 30 seconds; code=-1.
Mon Jul 21 09:24:33 2003 Set thread id 14 for FSYNC_sync
Mon Jul 21 09:24:33 2003 Partition /vicepc: attached 0 volumes; 0 volumes not attached
Mon Jul 21 09:24:33 2003 Partition /vicepa: attached 0 volumes; 0 volumes not attached
Mon Jul 21 09:24:33 2003 Set thread id 15 for 'FiveMinuteCheckLWP'
Mon Jul 21 09:24:33 2003 Set thread id 16 for 'HostCheckLWP'
Mon Jul 21 09:24:33 2003 Getting FileServer name...
Mon Jul 21 09:24:33 2003 FileServer host name is 'nmrindy.natur.cuni.cz'
Mon Jul 21 09:24:33 2003 Getting FileServer address...
Mon Jul 21 09:24:33 2003 FileServer nmrindy.natur.cuni.cz has address 195.113.59.111 (0xc3713b6f or 0xc3713b6f in host byte order)
Mon Jul 21 09:24:33 2003 File Server started Mon Jul 21 09:24:33 2003


During these days, BosLog says:

Sun Jul 20 04:00:47 2003: Server directory access is okay
Sun Jul 20 04:00:51 2003: fs:salv exited with code 0
Sun Jul 20 04:04:18 2003: fs:vol exited with code 1
Sun Jul 20 04:07:44 2003: fs:vol exited with code 1
[...]
Mon Jul 21 09:20:33 2003: fs:vol exited with code 1
Mon Jul 21 09:23:59 2003: fs:vol exited with code 1

VLLog says:
rx_sendmsg: Host is down
rx_sendmsg: Host is down
rx_sendmsg: Host is down
[...]
rx failed to send packet: rx failed to send packet: rx failed to send packet ...
[...]
rx_sendmsg: Host is down
rx_sendmsg: Host is down
rx_sendmsg: Host iSun Julrx failed to send packet: rx failed to send packet: rx failed to sMon Jul 21 09:24:16 2003 ubik: A Remote Serve
r has addresses: @(#) OpenAFS 1.2.10-rc2 built  2003-07-09
Mon Jul 21 09:24:16 2003 195.113.59.251 Mon Jul 21 09:24:16 2003 10.0.0.1 Mon Jul 21 09:24:16 2003
Mon Jul 21 09:24:23 2003 ubik:server 195.113.59.251 is back up: will be contacted through 195.113.59.251
Mon Jul 21 09:25:52 2003 Ubik: Synchronize database with server 195.113.59.251
Mon Jul 21 09:25:52 2003 Ubik: Synchronize database completed
Mon Jul 21 13:08:40 2003 ubik: A Remote Server has addresses: Mon Jul 21 13:08:40 2003 195.113.59.251 Mon Jul 21 13:08:40 2003 10.0.0.1
Mon Jul 21 13:08:40 2003
Mon Jul 21 13:09:45 2003 ubik:server 195.113.59.251 is back up: will be contacted through 195.113.59.251
Tue Jul 22 13:16:50 2003 ubik: A Remote Server has addresses: Tue Jul 22 13:16:50 2003 195.113.59.121 Tue Jul 22 13:16:50 2003
Tue Jul 22 13:17:29 2003 ubik:server 195.113.59.121 is back up: will be contacted through 195.113.59.121
st is down
rx_sendmsg: Host is down
rx_sendmsg: Host is down
[...]

Same debug messages as in VLLog are in PtLog.

Has anyone clue what is the cause for fileserver crash? Thanks.

-- 
Martin Mokrejs <mmokrejs@natur.cuni.cz>, <m.mokrejs@gsf.de>
PGP5.0i key is at http://www.natur.cuni.cz/~mmokrejs
MIPS / Institute for Bioinformatics <http://mips.gsf.de>
GSF - National Research Center for Environment and Health
Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany
tel.: +49-89-3187 3683 , fax: +49-89-3187 3585