[OpenAFS] fileserver - salvage loop
Matthew Cocker
matt@cs.auckland.ac.nz
Tue, 21 Oct 2003 09:30:58 +1300
Hi
at exactly the same time last night we had 3 Openafs servers go into a
continuous fileserver-salvage-fs-salv loop.
Mon Oct 20 17:05:07 2003: fs:file exited on signal 11
Mon Oct 20 17:05:07 2003: fs:vol exited on signal 15
Mon Oct 20 17:09:54 2003: fs:salv exited with code 0
Mon Oct 20 17:11:24 2003: fs:file exited with code 1
Mon Oct 20 17:11:24 2003: fs:vol exited on signal 15
Mon Oct 20 17:14:54 2003: fs:salv exited with code 0
Mon Oct 20 17:16:24 2003: fs:file exited with code 1
Mon Oct 20 17:16:24 2003: fs:vol exited on signal 15
Mon Oct 20 17:18:15 2003: fs:salv exited with code 0
Mon Oct 20 17:19:45 2003: fs:file exited with code 1
Mon Oct 20 17:19:45 2003: fs:vol exited on signal 15
Mon Oct 20 17:20:59 2003: fs:salv exited with code 0
Mon Oct 20 17:22:29 2003: fs:file exited with code 1
Mon Oct 20 17:22:29 2003: fs:vol exited on signal 15
Mon Oct 20 17:23:37 2003: fs:salv exited with code 0
Mon Oct 20 17:25:07 2003: fs:file exited with code 1
The box is looping still if anyone has a command the want me to run I
can leave the box like this for a little while. The fileserver process
719 is the one that has not died.
openafs 1.2.9 redhat 7.3 kernel 2.4.20-18.7
[root@afs-11-fos-ec logs]# ps -welf
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY
TIME CMD
100 S root 1 0 0 68 0 - 343 do_sel Sep04 ?
00:00:08 init [3]
040 S root 2 1 0 69 0 - 0 contex Sep04 ?
00:00:04 [keventd]
040 S root 3 1 0 79 19 - 0 ksofti Sep04 ?
00:00:09 [ksoftirqd_CPU0]
040 S root 4 1 0 69 0 - 0 wakeup Sep04 ?
00:04:55 [kswapd]
040 S root 5 1 0 78 0 - 0 kscand Sep04 ?
06:28:34 [kscand]
040 S root 6 1 0 69 0 - 0 bdflus Sep04 ?
00:00:00 [bdflush]
040 S root 7 1 0 69 0 - 0 kupdat Sep04 ?
00:00:22 [kupdated]
040 S root 8 1 0 59 -20 - 0 md_thr Sep04 ?
00:00:00 [mdrecoveryd]
040 S root 14 1 0 69 0 - 0 end Sep04 ?
00:00:00 [aacraid]
040 S root 15 1 0 69 0 - 0 down_i Sep04 ?
00:00:00 [scsi_eh_0]
040 S root 18 1 0 69 0 - 0 end Sep04 ?
00:00:04 [kjournald]
040 S root 146 1 0 69 0 - 0 end Sep04 ?
00:00:00 [kjournald]
040 S root 147 1 0 69 0 - 0 end Sep04 ?
00:01:57 [kjournald]
040 S root 148 1 0 71 0 - 0 end Sep04 ?
00:01:44 [kjournald]
040 S root 538 1 0 69 0 - 357 do_sel Sep04 ?
00:00:00 syslogd -m 0
140 S root 543 1 0 69 0 - 342 do_sys Sep04 ?
00:00:00 klogd -x
040 S root 698 1 0 73 0 - 1021 do_sel Sep04 ?
00:08:33 /sbin/bosserver
040 S root 715 1 0 68 0 - 384 nanosl Sep04 ?
00:00:00 crond
040 S root 719 1 0 69 0 - 12151 rt_sig Sep04 ?
00:00:00 /libexec/openafs/fileserver
100 S root 758 1 0 69 0 - 336 read_c Sep04 tty1
00:00:00 /sbin/mingetty tty1
100 S root 759 1 0 69 0 - 337 read_c Sep04 tty2
00:00:00 /sbin/mingetty tty2
100 S root 760 1 0 69 0 - 336 read_c Sep04 tty3
00:00:00 /sbin/mingetty tty3
100 S root 762 1 0 69 0 - 335 read_c Sep04 tty5
00:00:00 /sbin/mingetty tty5
100 S root 763 1 0 69 0 - 335 read_c Sep04 tty6
00:00:00 /sbin/mingetty tty6
140 S root 6548 1 0 69 0 - 482 do_sel Sep23 ?
00:00:00 /usr/sbin/zebra -d
140 S root 6610 1 0 69 0 - 530 do_sel Sep23 ?
00:00:23 /usr/sbin/ripd -d
100 S root 7145 1 0 69 0 - 335 read_c Sep23 tty4
00:00:00 /sbin/mingetty tty4
140 S root 8966 1 0 69 0 - 867 do_sel Sep24 ?
00:00:07 /usr/sbin/sshd
040 S root 5109 8966 0 69 0 - 1531 do_sel 09:19 ?
00:00:00 sshd: root@pts/0
100 S root 5114 5109 0 76 0 - 618 wait4 09:20 pts/0
00:00:00 -bash
100 S root 20718 698 0 72 -5 - 2611 nanosl 09:27 ?
00:00:00 /libexec/openafs/fileserver
100 S root 20719 698 0 74 0 - 1289 tcp_da 09:27 ?
00:00:00 /libexec/openafs/volserver
040 S root 20720 20718 0 73 0 - 2611 do_pol 09:27 ?
00:00:00 /libexec/openafs/fileserver
040 S root 20721 20720 0 73 0 - 2611 rt_sig 09:27 ?
00:00:00 /libexec/openafs/fileserver
000 R root 20723 5114 0 78 0 - 774 - 09:27 pts/0
00:00:00 ps -welf