[OpenAFS] Salvager did not run automatically on solaris 9, 1.4.1-rc10

Thu, 13 Apr 2006 09:41:49 -0700 (PDT)

Hi, we recently had a fileserver crash because of an ecache error.
When the server came back up it had the further misfortune of a fibre
channel adapter error which prevented the drives containing the vice
partitions from coming back online.  Once those issues were dealt
with, the system was again rebooted and came up with its vice
partitions but did not salvage on its own...we had to run bos salvage
manually to bring the volumes online.  This is a solaris 9 system
running openafs 1.4.1-rc10.  There are 2 partitions on it and the fs
process specifies 2 parallel salvage processes.  Unfortunately I was
not there to see all the details when the system came back online and
the admin who restored the system ran separate salvager commands for
the 3 200gb volumes that live on the system and didn't preserve the
original salvage logs.  Is it to be expected that salvager won't run
automatically after such a sequence of events?  Another couple of
pieces of information...I recently converted this system from inode to
namei, it does not have 'enable-fast-restart' configured into it, and
here are the entries from BosLog:

renata@afs103 $ 9:25 cat BosLog.old
Tue Apr 11 22:04:18 2006: Server directory access is okay    <== came up w/o vice partitions
Tue Apr 11 22:04:19 2006: fs:salv exited with code 0
Wed Apr 12 08:51:05 2006: upclientbin exited on signal 15    <== went down to fix fc adapter
Wed Apr 12 08:51:05 2006: upclientetc exited on signal 15
Wed Apr 12 08:51:05 2006: fs:vol exited on signal 15
Wed Apr 12 08:51:05 2006: fs:file exited with code 0

renata@afs103 $ 9:25 cat BosLog
Wed Apr 12 08:54:06 2006: Server directory access is okay    <== came up healthy
Wed Apr 12 09:18:07 2006: salvage-tmp exited with code 0     <== 3 manual salvages
Wed Apr 12 09:25:20 2006: salvage-tmp exited with code 0
Wed Apr 12 09:27:07 2006: salvage-tmp exited with code 0

Thanks,

Renata