[OpenAFS] fileserver - salvage loop

Matthew Cocker matt@cs.auckland.ac.nz
Tue, 21 Oct 2003 09:30:58 +1300


Hi

at exactly the same time last night we had 3 Openafs servers go into a 
continuous fileserver-salvage-fs-salv loop.

Mon Oct 20 17:05:07 2003: fs:file exited on signal 11
Mon Oct 20 17:05:07 2003: fs:vol exited on signal 15
Mon Oct 20 17:09:54 2003: fs:salv exited with code 0
Mon Oct 20 17:11:24 2003: fs:file exited with code 1
Mon Oct 20 17:11:24 2003: fs:vol exited on signal 15
Mon Oct 20 17:14:54 2003: fs:salv exited with code 0
Mon Oct 20 17:16:24 2003: fs:file exited with code 1
Mon Oct 20 17:16:24 2003: fs:vol exited on signal 15
Mon Oct 20 17:18:15 2003: fs:salv exited with code 0
Mon Oct 20 17:19:45 2003: fs:file exited with code 1
Mon Oct 20 17:19:45 2003: fs:vol exited on signal 15
Mon Oct 20 17:20:59 2003: fs:salv exited with code 0
Mon Oct 20 17:22:29 2003: fs:file exited with code 1
Mon Oct 20 17:22:29 2003: fs:vol exited on signal 15
Mon Oct 20 17:23:37 2003: fs:salv exited with code 0
Mon Oct 20 17:25:07 2003: fs:file exited with code 1


The box is looping still if anyone has a command the want me to run I 
can leave the box like this for a little while. The fileserver process 
719 is the one that has not died.

openafs 1.2.9 redhat 7.3 kernel 2.4.20-18.7

[root@afs-11-fos-ec logs]# ps -welf
   F S UID        PID  PPID  C PRI  NI ADDR    SZ WCHAN  STIME TTY 
     TIME CMD
100 S root         1     0  0  68   0    -   343 do_sel Sep04 ? 
00:00:08 init [3]
040 S root         2     1  0  69   0    -     0 contex Sep04 ? 
00:00:04 [keventd]
040 S root         3     1  0  79  19    -     0 ksofti Sep04 ? 
00:00:09 [ksoftirqd_CPU0]
040 S root         4     1  0  69   0    -     0 wakeup Sep04 ? 
00:04:55 [kswapd]
040 S root         5     1  0  78   0    -     0 kscand Sep04 ? 
06:28:34 [kscand]
040 S root         6     1  0  69   0    -     0 bdflus Sep04 ? 
00:00:00 [bdflush]
040 S root         7     1  0  69   0    -     0 kupdat Sep04 ? 
00:00:22 [kupdated]
040 S root         8     1  0  59 -20    -     0 md_thr Sep04 ? 
00:00:00 [mdrecoveryd]
040 S root        14     1  0  69   0    -     0 end    Sep04 ? 
00:00:00 [aacraid]
040 S root        15     1  0  69   0    -     0 down_i Sep04 ? 
00:00:00 [scsi_eh_0]
040 S root        18     1  0  69   0    -     0 end    Sep04 ? 
00:00:04 [kjournald]
040 S root       146     1  0  69   0    -     0 end    Sep04 ? 
00:00:00 [kjournald]
040 S root       147     1  0  69   0    -     0 end    Sep04 ? 
00:01:57 [kjournald]
040 S root       148     1  0  71   0    -     0 end    Sep04 ? 
00:01:44 [kjournald]
040 S root       538     1  0  69   0    -   357 do_sel Sep04 ? 
00:00:00 syslogd -m 0
140 S root       543     1  0  69   0    -   342 do_sys Sep04 ? 
00:00:00 klogd -x
040 S root       698     1  0  73   0    -  1021 do_sel Sep04 ? 
00:08:33 /sbin/bosserver
040 S root       715     1  0  68   0    -   384 nanosl Sep04 ? 
00:00:00 crond
040 S root       719     1  0  69   0    - 12151 rt_sig Sep04 ? 
00:00:00 /libexec/openafs/fileserver
100 S root       758     1  0  69   0    -   336 read_c Sep04 tty1 
00:00:00 /sbin/mingetty tty1
100 S root       759     1  0  69   0    -   337 read_c Sep04 tty2 
00:00:00 /sbin/mingetty tty2
100 S root       760     1  0  69   0    -   336 read_c Sep04 tty3 
00:00:00 /sbin/mingetty tty3
100 S root       762     1  0  69   0    -   335 read_c Sep04 tty5 
00:00:00 /sbin/mingetty tty5
100 S root       763     1  0  69   0    -   335 read_c Sep04 tty6 
00:00:00 /sbin/mingetty tty6
140 S root      6548     1  0  69   0    -   482 do_sel Sep23 ? 
00:00:00 /usr/sbin/zebra -d
140 S root      6610     1  0  69   0    -   530 do_sel Sep23 ? 
00:00:23 /usr/sbin/ripd -d
100 S root      7145     1  0  69   0    -   335 read_c Sep23 tty4 
00:00:00 /sbin/mingetty tty4
140 S root      8966     1  0  69   0    -   867 do_sel Sep24 ? 
00:00:07 /usr/sbin/sshd
040 S root      5109  8966  0  69   0    -  1531 do_sel 09:19 ? 
00:00:00 sshd: root@pts/0
100 S root      5114  5109  0  76   0    -   618 wait4  09:20 pts/0 
00:00:00 -bash
100 S root     20718   698  0  72  -5    -  2611 nanosl 09:27 ? 
00:00:00 /libexec/openafs/fileserver
100 S root     20719   698  0  74   0    -  1289 tcp_da 09:27 ? 
00:00:00 /libexec/openafs/volserver
040 S root     20720 20718  0  73   0    -  2611 do_pol 09:27 ? 
00:00:00 /libexec/openafs/fileserver
040 S root     20721 20720  0  73   0    -  2611 rt_sig 09:27 ? 
00:00:00 /libexec/openafs/fileserver
000 R root     20723  5114  0  78   0    -   774 -      09:27 pts/0 
00:00:00 ps -welf