[OpenAFS] OpenAFS 1.2.9 fileserver coredumped
Renata Maria Dart
Renata Maria Dart <renata@SLAC.Stanford.EDU>
Fri, 23 Jan 2004 10:23:03 -0800 (PST)
Hi, we have 8 solaris 9 fileservers running a mixture of OpenAFS
1.2.9 and 1.2.10. They 1.2.9 fileservers have all been running
uneventfully since last September until last night when one of
them restarted and left a corefile.fs:
Instance fs, (type is fs) has core file, currently running normally.
Auxiliary status is: file server running.
Process last started at Thu Jan 22 17:38:52 2004 (5 proc starts)
Last exit at Thu Jan 22 17:38:52 2004
Last error exit at Thu Jan 22 17:33:35 2004, by file, due to signal 6
The FileLog.old has nothing of interest...here are the last 3 lines:
Thu Jan 22 17:00:12 2004 CB: RCallBackConnectBack (host.c) failed for host
216.228.9.61:65083
Thu Jan 22 17:01:08 2004 CB: RCallBackConnectBack (host.c) failed for host
66.254.248.162:7001
Thu Jan 22 17:02:05 2004 CB: RCallBackConnectBack (host.c) failed for host
129.74.193.23:7001
I tried running showProcInfo against the fileserver and the corefile:
renata@victoria $ 9:50 /afs/slac/common/afstools/showProcInfo fileserver
afs09.corefile.fs.jan2204
Information for fileserver Fri Jan 23 09:51:05 PST 2004
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.0' in your .dbxrc
Reading fileserver
core file header read successfully
Reading ld.so.1
Reading libpthread.so.1
Reading libsocket.so.1
Reading libresolv.so.2
Reading libnsl.so.1
Reading libintl.so.1
Reading libdl.so.1
Reading libc.so.1
Reading libmp.so.2
Reading libc_psr.so.1
Reading libthread.so.1
Reading nss_files.so.1
WARNING!!
A loadobject was found with an unexpected checksum value.
See `help core mismatch' for details, and run `proc -map'
to see what checksum values were expected and found.
dbx: warning: Some symbolic information might be incorrect.
detected a multithreaded program
dbx: warning: could not initialize libthread_db.so -- debugger service failed
dbx: warning: thread related commands will not be available
dbx: warning: see `help lwp', `help lwps' and `help where'
t@null (l@92) terminated by signal ABRT (Abort)
0xff19e42c: _lwp_kill+0x0008: bgeu,a _lwp_kill+0x1c
dbx: warning: can't find file "/var/tmp/openafs-1.2.9/src/tviced/viced.o"
dbx: warning: see `help finding-files'
dbx: warning: can't find file "/var/tmp/openafs-1.2.9/src/tviced/assert.o"
dbx: warning: can't find file "/var/tmp/openafs-1.2.9/src/tviced/vnode.o"
dbx: warning: can't find file "/var/tmp/openafs-1.2.9/src/tviced/afsfileprocs.o"
dbx: warning: can't find file "/var/tmp/openafs-1.2.9/src/tviced/afsint.ss.o"
dbx: warning: can't find file "/var/tmp/openafs-1.2.9/lib/libafsrpc.a(rx.o)"
dbx: warning: can't find file
"/var/tmp/openafs-1.2.9/lib/libafsrpc.a(rx_pthread.o)"
dbx: thread related commands not available
=>[1] _lwp_kill(0x0, 0x6, 0x0, 0xff1bc000, 0x5, 0x248800a), at 0xff19e42c
[2] raise(0x6, 0x0, 0xf95fb958, 0xff1bc000, 0x0, 0x0), at 0xff14cd70
[3] abort(0x0, 0xe4f0c, 0xf95fb9e8, 0x117320, 0x2bd, 0x0), at 0xff135c60
[4] AssertionFailed(0x117320, 0x2bd, 0x2, 0xf95fba00, 0x125f00, 0x1400), at
0x4a500
[5] VPutVnode_r(0xf95fbb2c, 0xb384c0, 0x65fbb8, 0x12197c, 0x6a7f68, 0x65a738),
at 0x521f4
[6] VPutVnode(0xf95fbb2c, 0xb384c0, 0x12c930, 0x12197c, 0x1218b2, 0x834), at
0x52060
[7] PutVolumePackage(0x0, 0xb384c0, 0xac4928, 0xf10058, 0x0, 0x12ec00), at
0x389cc
[8] SAFSS_CreateFile(0x110400, 0xf95fbde0, 0x24cc448, 0xf95fbdc8, 0xf95fbdbc,
0xf95fbd68), at 0x310b8
[9] SRXAFS_CreateFile(0x133b98, 0xf95fbde0, 0x24cc448, 0xf95fbdc8, 0xf95fbdbc,
0xf95fbd68), at 0x311d0
[10] _RXAFS_CreateFile(0xe4d898, 0xf95fbe58, 0x111b28, 0x1, 0x133c00,
0x1b590000), at 0x5f540
[11] RXAFS_ExecuteRequest(0xe4d898, 0x8, 0xf95fbf3c, 0xffffffff, 0x25517f0,
0xf95fbf3c), at 0x63a90
[12] rxi_ServerProc(0x111800, 0x111800, 0xf95fbf34, 0x0, 0x0, 0x0), at 0x7746c
[13] rx_ServerProc(0xff0a7600, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x74c10
[14] server_entry(0x74b10, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x74560
Since this fileserver has restarted, it is now running 1.2.10. I would
like to know if the cause of this failure has been fixed in 1.2.10 and
if I should just upgrade all of my 1.2.9 systems, or is this a problem
that still needs to be addressed.
Thanks for your help,
Renata
Renata Dart | renata@SLAC.Stanford.edu
Stanford Linear Accelerator Center |
2575 Sand Hill Road, MS 97 | (650) 926-2848 (office)
Stanford, California 94025 | (650) 926-3329 (fax)