[OpenAFS] AFS 1.2.8 fileserver Failing in GetClient()
Douglas E. Engert
deengert@anl.gov
Mon, 31 Mar 2003 10:54:07 -0600
Two separate AFS servers are having the same problem of the fileserver process
dumping. They are running AFS 1.2.8 on Solaris 5.8. AFS 1.2.8 was installed during
the week. (It is not clear if the problems started when 1.2.8 was installed.
but it might have.) The third server is not having problems, but does not have
many volumes.
Any ideas?
The dump of fileserver show this trace:
(gdb) where
#0 0xff0d9764 in __sigprocmask () from /usr/lib/libthread.so.1
#1 0xff0ce978 in _resetsig () from /usr/lib/libthread.so.1
#2 0xff0ce118 in _sigon () from /usr/lib/libthread.so.1
#3 0xff0d1158 in _thrp_kill () from /usr/lib/libthread.so.1
#4 0xff14b9dc in raise () from /usr/lib/libc.so.1
#5 0xff1358fc in abort () from /usr/lib/libc.so.1
#6 0x0004a008 in AssertionFailed ()
#7 0x0003f4f4 in GetClient ()
#8 0x0003858c in GetVolumePackage ()
#9 0x0002feec in SAFSS_StoreStatus ()
#10 0x000300fc in SRXAFS_StoreStatus ()
#11 0x0005e74c in _RXAFS_StoreStatus ()
#12 0x000630c0 in RXAFS_ExecuteRequest ()
#13 0x000768d0 in rxi_ServerProc ()
#14 0x000741e4 in rx_ServerProc ()
#15 0x00073bb0 in server_entry ()
This appears to be failing in the GetClient() routine when called
from the GetVolumePackage.
The BosLog log shows this for example:
Mon Mar 31 09:25:01 2003: fs:file exited on signal 6 (core dumped)
Mon Mar 31 09:25:01 2003: fs:vol exited on signal 15
Mon Mar 31 09:27:39 2003: fs:salv exited with code 0
and the FileLog shows(Not sure if they are related):
Mon Mar 31 09:00:01 2003 *** Vid=32766, sid=fa117a18, tcon=5a5e08, Tcon=59d708 ***
Mon Mar 31 09:05:00 2003 *** Vid=32766, sid=f9e25928, tcon=5a8418, Tcon=5aa4d8 ***
Mon Mar 31 09:05:49 2003 *** Vid=32766, sid=fa117a20, tcon=5a75e8, Tcon=5aaa60 ***
Mon Mar 31 09:10:00 2003 *** Vid=32766, sid=fa117a3c, tcon=5aa5a0, Tcon=5a5e08 ***
Mon Mar 31 09:14:26 2003 *** Vid=32766, sid=f9e25944, tcon=5941d8, Tcon=5aba90 ***
Mon Mar 31 09:16:58 2003 *** Vid=32766, sid=f9e25950, tcon=58c540, Tcon=58faf8 ***
Mon Mar 31 09:24:26 2003 *** Vid=32766, sid=f9e25944, tcon=5aba90, Tcon=594b40 ***
A grep of the Boslogs on the two machines show some regularity to the
failures, on one of them at least, indicating some timer might be involved:
(Its always a multiple of 5 minutes with the dump a second or two after.)
BosLog.old:Sun Mar 30 04:50:02 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 10:05:01 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 10:40:02 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 11:30:02 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 12:25:01 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 15:30:02 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 15:55:02 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 16:50:01 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 17:25:02 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 19:40:01 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 20:05:01 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 21:40:01 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 22:10:02 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 23:20:01 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Mon Mar 31 01:40:01 2003: fs:file exited on signal 6 (core dumped)
BosLog:Mon Mar 31 09:25:01 2003: fs:file exited on signal 6 (core dumped)
BosLog:Mon Mar 31 10:05:01 2003: fs:file exited on signal 6 (core dumped)
The other machine is not as regular:
BosLog.old:Sun Mar 30 12:44:56 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 13:20:58 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 14:32:06 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 16:43:22 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 17:58:31 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 18:55:10 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 22:06:32 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Sun Mar 30 23:30:12 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Mon Mar 31 01:01:23 2003: fs:file exited on signal 6 (core dumped)
BosLog.old:Mon Mar 31 02:03:02 2003: fs:file exited on signal 6 (core dumped)
BosLog:Mon Mar 31 04:52:52 2003: fs:file exited on signal 6 (core dumped)
BosLog:Mon Mar 31 06:04:37 2003: fs:file exited on signal 6 (core dumped)
BosLog:Mon Mar 31 08:20:52 2003: fs:file exited on signal 6 (core dumped)
BosLog:Mon Mar 31 09:42:03 2003: fs:file exited on signal 6 (core dumped)
--
Douglas E. Engert <DEEngert@anl.gov>
Argonne National Laboratory
9700 South Cass Avenue
Argonne, Illinois 60439
(630) 252-5444