[OpenAFS] OpenAFS 1.2.9 fileserver coredumped

Renata Maria Dart Renata Maria Dart <renata@SLAC.Stanford.EDU>
Fri, 23 Jan 2004 10:23:03 -0800 (PST)


Hi, we have 8 solaris 9 fileservers running a mixture of OpenAFS
1.2.9 and 1.2.10.  They 1.2.9 fileservers have all been running
uneventfully since last September until last night when one of
them restarted and left a corefile.fs:


  Instance fs, (type is fs) has core file, currently running normally.
    Auxiliary status is: file server running.
    Process last started at Thu Jan 22 17:38:52 2004 (5 proc starts)
    Last exit at Thu Jan 22 17:38:52 2004
    Last error exit at Thu Jan 22 17:33:35 2004, by file, due to signal 6


The FileLog.old has nothing of interest...here are the last 3 lines:


  Thu Jan 22 17:00:12 2004 CB: RCallBackConnectBack (host.c) failed for host  
216.228.9.61:65083
  Thu Jan 22 17:01:08 2004 CB: RCallBackConnectBack (host.c) failed for host 
66.254.248.162:7001
  Thu Jan 22 17:02:05 2004 CB: RCallBackConnectBack (host.c) failed for host 
129.74.193.23:7001


I tried running showProcInfo against the fileserver and the corefile:


renata@victoria $ 9:50 /afs/slac/common/afstools/showProcInfo fileserver 
afs09.corefile.fs.jan2204
Information for fileserver Fri Jan 23 09:51:05 PST 2004

For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.0' in your .dbxrc
Reading fileserver
core file header read successfully
Reading ld.so.1
Reading libpthread.so.1
Reading libsocket.so.1
Reading libresolv.so.2
Reading libnsl.so.1
Reading libintl.so.1
Reading libdl.so.1
Reading libc.so.1
Reading libmp.so.2
Reading libc_psr.so.1
Reading libthread.so.1
Reading nss_files.so.1
WARNING!!
A loadobject was found with an unexpected checksum value.
See `help core mismatch' for details, and run `proc -map'
to see what checksum values were expected and found.
dbx: warning: Some symbolic information might be incorrect.
detected a multithreaded program
dbx: warning: could not initialize libthread_db.so -- debugger service failed
dbx: warning: thread related commands will not be available
dbx: warning: see `help lwp', `help lwps' and `help where'
t@null (l@92) terminated by signal ABRT (Abort)
0xff19e42c: _lwp_kill+0x0008:   bgeu,a  _lwp_kill+0x1c
dbx: warning: can't find file "/var/tmp/openafs-1.2.9/src/tviced/viced.o"
dbx: warning: see `help finding-files'
dbx: warning: can't find file "/var/tmp/openafs-1.2.9/src/tviced/assert.o"
dbx: warning: can't find file "/var/tmp/openafs-1.2.9/src/tviced/vnode.o"
dbx: warning: can't find file "/var/tmp/openafs-1.2.9/src/tviced/afsfileprocs.o"
dbx: warning: can't find file "/var/tmp/openafs-1.2.9/src/tviced/afsint.ss.o"
dbx: warning: can't find file "/var/tmp/openafs-1.2.9/lib/libafsrpc.a(rx.o)"
dbx: warning: can't find file 
"/var/tmp/openafs-1.2.9/lib/libafsrpc.a(rx_pthread.o)"
dbx: thread related commands not available
=>[1] _lwp_kill(0x0, 0x6, 0x0, 0xff1bc000, 0x5, 0x248800a), at 0xff19e42c
  [2] raise(0x6, 0x0, 0xf95fb958, 0xff1bc000, 0x0, 0x0), at 0xff14cd70
  [3] abort(0x0, 0xe4f0c, 0xf95fb9e8, 0x117320, 0x2bd, 0x0), at 0xff135c60
  [4] AssertionFailed(0x117320, 0x2bd, 0x2, 0xf95fba00, 0x125f00, 0x1400), at 
0x4a500
  [5] VPutVnode_r(0xf95fbb2c, 0xb384c0, 0x65fbb8, 0x12197c, 0x6a7f68, 0x65a738), 
at 0x521f4
  [6] VPutVnode(0xf95fbb2c, 0xb384c0, 0x12c930, 0x12197c, 0x1218b2, 0x834), at 
0x52060
  [7] PutVolumePackage(0x0, 0xb384c0, 0xac4928, 0xf10058, 0x0, 0x12ec00), at 
0x389cc
  [8] SAFSS_CreateFile(0x110400, 0xf95fbde0, 0x24cc448, 0xf95fbdc8, 0xf95fbdbc, 
0xf95fbd68), at 0x310b8
  [9] SRXAFS_CreateFile(0x133b98, 0xf95fbde0, 0x24cc448, 0xf95fbdc8, 0xf95fbdbc, 
0xf95fbd68), at 0x311d0
  [10] _RXAFS_CreateFile(0xe4d898, 0xf95fbe58, 0x111b28, 0x1, 0x133c00, 
0x1b590000), at 0x5f540
  [11] RXAFS_ExecuteRequest(0xe4d898, 0x8, 0xf95fbf3c, 0xffffffff, 0x25517f0, 
0xf95fbf3c), at 0x63a90
  [12] rxi_ServerProc(0x111800, 0x111800, 0xf95fbf34, 0x0, 0x0, 0x0), at 0x7746c
  [13] rx_ServerProc(0xff0a7600, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x74c10
  [14] server_entry(0x74b10, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x74560


Since this fileserver has restarted, it is now running 1.2.10.  I would
like to know if the cause of this failure has been fixed in 1.2.10 and 
if I should just upgrade all of my 1.2.9 systems, or is this a problem
that still needs to be addressed.

Thanks for your help,

Renata


 Renata Dart                         | renata@SLAC.Stanford.edu  
 Stanford Linear Accelerator Center  |    
 2575 Sand Hill Road, MS 97          | (650) 926-2848 (office)
 Stanford, California   94025        | (650) 926-3329 (fax)