[OpenAFS] My salvager was cored by my volume.

Harald Barth haba@kth.se
Thu, 28 Jun 2007 16:13:16 +0200 (MEST)


Yesterday I had a server crash after a HW-RAID box decided to go out
for lunch wihout even trying to have a reason. After I restarted with
fast-restart and then salvaged everything. First pass with 
orphans ignore:

+ /usr/openafs/bin/bos salvage -server ruffe -partition a -volume pdc.vol.module -showlog -orphans ignore -localauth
Starting salvage.
bos: salvage completed
SalvageLog:
@(#) OpenAFS 1.4.4 built  2007-04-25 
06/27/2007 20:07:27 STARTING AFS SALVAGER 2.4 (/usr/openafs/libexec/openafs/salvager /vicepa 537045984 -orphans ignore)
06/27/2007 20:07:28 2 nVolumesInInodeFile 64 
06/27/2007 20:07:28 CHECKING CLONED VOLUME 537045986.
06/27/2007 20:07:28 pdc.vol.module.backup (537045986) updated 06/01/2005 14:10
06/27/2007 20:07:28 SALVAGING VOLUME 537045984.
06/27/2007 20:07:28 pdc.vol.module (537045984) updated 06/01/2005 14:10
06/27/2007 20:07:28 totalInodes 3019
06/27/2007 20:07:29 dir vnode 451: ??/.. (vnode 449): unique changed from 6629 to 11697 -- deleted
06/27/2007 20:07:29 dir vnode 455: ??/.. (vnode 453): unique changed from 6631 to 7491 -- deleted
06/27/2007 20:07:29 Vnode 449: link count incorrect (was 2, now 1)
06/27/2007 20:07:29 Vnode 453: link count incorrect (was 9, now 8)
06/27/2007 20:07:29 Found 2 orphaned files and directories (approx. 4 KB)
06/27/2007 20:07:29 Salvaged pdc.vol.module (537045984): 3012 files, 25862 block

Second pass with orphans attach:

+ /usr/openafs/bin/bos salvage -server ruffe -partition a -volume pdc.vol.module -showlog -orphans attach -localauth
Starting salvage.
bos: salvage completed
SalvageLog:
@(#) OpenAFS 1.4.4 built  2007-04-25 
06/28/2007 15:57:26 STARTING AFS SALVAGER 2.4 (/usr/openafs/libexec/openafs/salvager /vicepa 537045984 -orphans attach)
06/28/2007 15:57:27 2 nVolumesInInodeFile 64 
06/28/2007 15:57:27 CHECKING CLONED VOLUME 537045986.
06/28/2007 15:57:27 pdc.vol.module.backup (537045986) updated 06/01/2005 14:10
06/28/2007 15:57:27 SALVAGING VOLUME 537045984.
06/28/2007 15:57:27 pdc.vol.module (537045984) updated 06/01/2005 14:10
06/28/2007 15:57:27 totalInodes 3019
06/28/2007 15:57:28 The dir header alloc map for page 0 is bad.
06/28/2007 15:57:28 Directory bad, vnode 451; salvaging...
06/28/2007 15:57:28 Salvaging directory 451...
06/28/2007 15:57:28 Checking the results of the directory salvage...
06/28/2007 15:57:28 The dir header alloc map for page 0 is bad.
06/28/2007 15:57:28 Directory bad, vnode 455; salvaging...
06/28/2007 15:57:28 Salvaging directory 455...
06/28/2007 15:57:28 Checking the results of the directory salvage...
06/28/2007 15:57:28 "Salvage volume group" core dumped!

How unhappy is my volume or my salvager and where is that core?

Yes, I can access the volume and no, it is not written very often.

haba@habarber /afs/pdc.kth.se/pdc/vol/module/3.1.6 $ ls
amd64_fc3  i386_fc3  ia64_deb30  man          rs_aix43
bin        i386_rh9  init        modulefiles  src
haba@habarber /afs/pdc.kth.se/pdc/vol/module/3.1.6 $ fs lq .
Volume Name                   Quota      Used %Used   Partition
pdc.vol.module                50000     25862   52%         69%  

# vos exa pdc.vol.module -local
pdc.vol.module                    537045984 RW      25862 K  On-line
    ruffe.pdc.kth.se /vicepa 
    RWrite  537045984 ROnly          0 Backup  537045986 
    MaxQuota      50000 K 
    Creation    Fri May 16 10:20:22 2003
    Copy        Wed May  2 21:42:08 2007
    Backup      Thu Jun 28 02:18:52 2007
    Last Update Wed Jun  1 14:10:44 2005
    4874 accesses in the past day (i.e., vnode references)

    RWrite: 537045984     Backup: 537045986 
    number of sites -> 1
       server ruffe.pdc.kth.se partition /vicepa RW Site 

Tips and tricks how to proceed?

Harald.