[OpenAFS] My salvager was cored by my volume.
Hartmut Reuter
reuter@rzg.mpg.de
Thu, 28 Jun 2007 18:44:38 +0200
Harald Barth wrote:
> Yesterday I had a server crash after a HW-RAID box decided to go out
> for lunch wihout even trying to have a reason. After I restarted with
> fast-restart and then salvaged everything. First pass with
> orphans ignore:
>
> + /usr/openafs/bin/bos salvage -server ruffe -partition a -volume pdc.vol.module -showlog -orphans ignore -localauth
> Starting salvage.
> bos: salvage completed
> SalvageLog:
> @(#) OpenAFS 1.4.4 built 2007-04-25
> 06/27/2007 20:07:27 STARTING AFS SALVAGER 2.4 (/usr/openafs/libexec/openafs/salvager /vicepa 537045984 -orphans ignore)
> 06/27/2007 20:07:28 2 nVolumesInInodeFile 64
> 06/27/2007 20:07:28 CHECKING CLONED VOLUME 537045986.
> 06/27/2007 20:07:28 pdc.vol.module.backup (537045986) updated 06/01/2005 14:10
> 06/27/2007 20:07:28 SALVAGING VOLUME 537045984.
> 06/27/2007 20:07:28 pdc.vol.module (537045984) updated 06/01/2005 14:10
> 06/27/2007 20:07:28 totalInodes 3019
> 06/27/2007 20:07:29 dir vnode 451: ??/.. (vnode 449): unique changed from 6629 to 11697 -- deleted
> 06/27/2007 20:07:29 dir vnode 455: ??/.. (vnode 453): unique changed from 6631 to 7491 -- deleted
> 06/27/2007 20:07:29 Vnode 449: link count incorrect (was 2, now 1)
> 06/27/2007 20:07:29 Vnode 453: link count incorrect (was 9, now 8)
> 06/27/2007 20:07:29 Found 2 orphaned files and directories (approx. 4 KB)
> 06/27/2007 20:07:29 Salvaged pdc.vol.module (537045984): 3012 files, 25862 block
>
> Second pass with orphans attach:
>
> + /usr/openafs/bin/bos salvage -server ruffe -partition a -volume pdc.vol.module -showlog -orphans attach -localauth
> Starting salvage.
> bos: salvage completed
> SalvageLog:
> @(#) OpenAFS 1.4.4 built 2007-04-25
> 06/28/2007 15:57:26 STARTING AFS SALVAGER 2.4 (/usr/openafs/libexec/openafs/salvager /vicepa 537045984 -orphans attach)
> 06/28/2007 15:57:27 2 nVolumesInInodeFile 64
> 06/28/2007 15:57:27 CHECKING CLONED VOLUME 537045986.
> 06/28/2007 15:57:27 pdc.vol.module.backup (537045986) updated 06/01/2005 14:10
> 06/28/2007 15:57:27 SALVAGING VOLUME 537045984.
> 06/28/2007 15:57:27 pdc.vol.module (537045984) updated 06/01/2005 14:10
> 06/28/2007 15:57:27 totalInodes 3019
> 06/28/2007 15:57:28 The dir header alloc map for page 0 is bad.
> 06/28/2007 15:57:28 Directory bad, vnode 451; salvaging...
> 06/28/2007 15:57:28 Salvaging directory 451...
> 06/28/2007 15:57:28 Checking the results of the directory salvage...
> 06/28/2007 15:57:28 The dir header alloc map for page 0 is bad.
> 06/28/2007 15:57:28 Directory bad, vnode 455; salvaging...
> 06/28/2007 15:57:28 Salvaging directory 455...
> 06/28/2007 15:57:28 Checking the results of the directory salvage...
> 06/28/2007 15:57:28 "Salvage volume group" core dumped!
>
> How unhappy is my volume or my salvager and where is that core?
>
> Yes, I can access the volume and no, it is not written very often.
>
> haba@habarber /afs/pdc.kth.se/pdc/vol/module/3.1.6 $ ls
> amd64_fc3 i386_fc3 ia64_deb30 man rs_aix43
> bin i386_rh9 init modulefiles src
> haba@habarber /afs/pdc.kth.se/pdc/vol/module/3.1.6 $ fs lq .
> Volume Name Quota Used %Used Partition
> pdc.vol.module 50000 25862 52% 69%
>
> # vos exa pdc.vol.module -local
> pdc.vol.module 537045984 RW 25862 K On-line
> ruffe.pdc.kth.se /vicepa
> RWrite 537045984 ROnly 0 Backup 537045986
> MaxQuota 50000 K
> Creation Fri May 16 10:20:22 2003
> Copy Wed May 2 21:42:08 2007
> Backup Thu Jun 28 02:18:52 2007
> Last Update Wed Jun 1 14:10:44 2005
> 4874 accesses in the past day (i.e., vnode references)
>
> RWrite: 537045984 Backup: 537045986
> number of sites -> 1
> server ruffe.pdc.kth.se partition /vicepa RW Site
>
> Tips and tricks how to proceed?
The best would certainly be to find out why and where it core-dumped.
Compile the salvager with -g and without -O and run it under gdb with
-debug (to avoid it forks) or gdb the core file.
Hartmut
>
> Harald.
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
--
-----------------------------------------------------------------
Hartmut Reuter e-mail reuter@rzg.mpg.de
phone +49-89-3299-1328
RZG (Rechenzentrum Garching) fax +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------