[OpenAFS] Problem with Off-line volumes...unable to bring On-line

Hartmut Reuter reuter@rzg.mpg.de
Mon, 24 Jan 2011 15:03:46 +0100


Looks like a crash of the salvager. The SalvageLog should end differently=
 with=20
the summary line for the RW-volume. Are there any core files in /usr/afs/=
logs?=20
If not, make sure ulimit for core file size isn't set to 0 and retry.

You also could run the salvager by hand under gdb to see why it crashes. =
You=20
need then to add the -debug flag to prevent it from forking. E.g.

gdb /usr/afs/bin/salvager
...
(gdb) run /vicepb 536871656 -debug


Good luck,
Hartmut

McKee, Shawn wrote:
> Hi Everyone,
>
> I am having a problem with one of my OpenAFS file servers. About =BD of
> the volumes are =93Off-line=94 and I am unable to bring them online. Fi=
rst
> some system info and then I will list problem details and what I have t=
ried.
>
> The system is running Scientific Linux 5.5/x86_64 (basically CentOS 5.5
> 64-bit). The openafs rpms are:
>
> [atums2:~]# rpm -qa | grep openafs
>
> openafs-kpasswd-1.4.12-6.cern
>
> openafs-client-1.4.12-6.cern
>
> kernel-module-openafs-2.6.18-194.3.1.el5-1.4.12-5.cern
>
> openafs-1.4.12-6.cern
>
> kernel-module-openafs-2.6.18-194.8.1.el5-1.4.12-5.cern
>
> openafs-krb5-1.4.12-6.cern
>
> kernel-module-openafs-2.6.18-238.1.1.el5-1.4.12-6.cern
>
> openafs-server-1.4.12-6.cern
>
> The version of =91e2fsprogs=92 is 1.39
>
> The system has an ext3 1TB partition for AFS:
>
> [atums2:~]# df /vicepb
>
> Filesystem 1K-blocks Used Available Use% Mounted on
>
> /dev/sda1 1007931664 635382472 321349196 67% /vicepb
>
> The system has 931 volumes and only 470 are On-line while 461 are Off-l=
ine:
>
> [atums2:~]# vos listvol atums2
>
> Total number of volumes on server atums2 partition /vicepb: 931
>
> chamber.OLD_eml4a07 536872814 RW 8634169 K Off-line
>
> chamber.OLD_eml4a07.readonly 536872815 RO 8634169 K On-line
>
> chamber.OLD_eml4a09 536872817 RW 702642 K Off-line
>
> chamber.OLD_eml4a09.readonly 536872818 RO 702642 K On-line
>
> =85
>
> Total volumes onLine 470 ; Total volumes offLine 461 ; Total busy 0
>
> I have run =91bos salvage=92 on the partition multiple times. I have
> restarted the system. I have run a force fsck.ext3 check on the
> underlying partition (no problems found). Only RW volumes are Off-line.
> All RO volumes are On-line. There are a few RW volumes On-line (8 out o=
f
> 469) but the rest won=92t come On-line.
>
> Here is a particular volume which is Off-line:
>
> [atums2:~]# vos examine chdata.sn
>
> chdata.sn 536871656 RW 598 K Off-line
>
> atums2.cern.ch /vicepb
>
> RWrite 536871656 ROnly 0 Backup 0
>
> MaxQuota 10000000 K
>
> Creation Fri May 26 04:02:49 2006
>
> Copy Wed Oct 11 12:35:42 2006
>
> Backup Sun Jun 11 00:30:10 2006
>
> Last Access Fri Jan 7 16:38:32 2011
>
> Last Update Wed Apr 4 15:29:42 2007
>
> 0 accesses in the past day (i.e., vnode references)
>
> RWrite: 536871656 ROnly: 536871657 RClone: 536871657
>
> number of sites -> 3
>
> server atums1.cern.ch partition /vicepi RO Site -- Old release
>
> server atums2.cern.ch partition /vicepb RW Site -- New release
>
> server atums2.cern.ch partition /vicepb RO Site -- New release
>
> Try to bring online:
>
> [atums2:~]# vos online -server atums2 -partition /vicepb -id chdata.sn
>
> The FileLog shows:
>
> Sun Jan 23 22:57:03 2011 GetBitmap: addled vnode index in volume
> chdata.sn; volume needs salvage
>
> Sun Jan 23 22:57:03 2011 VAttachVolume: error getting bitmap for volume
> (/vicepb//V0536871656.vol)
>
> Try to Salvage:
>
> [atums2:~]# bos salvage atums2 /vicepb chdata.sn
>
> Starting salvage.
>
> bos: salvage completed
>
> The SalvageLog shows:
>
> [atums2:~]# tail /usr/afs/logs/SalvageLog
>
> @(#) OpenAFS 1.4.12 built 2010-12-13 1928681 19919656
>
> 01/23/2011 22:58:19 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager
> /vicepb 536871656)
>
> 01/23/2011 22:58:19 2 nVolumesInInodeFile 64
>
> 01/23/2011 22:58:19 CHECKING CLONED VOLUME 536871657.
>
> 01/23/2011 22:58:19 chdata.sn.readonly (536871657) updated 04/04/2007 1=
5:29
>
> 01/23/2011 22:58:19 Partially allocated vnode 2 deleted.
>
> Try again:
>
> [atums2:~]# vos online -server atums2 -partition /vicepb -id chdata.sn
>
>
> FileLog has the same message:
>
> Sun Jan 23 22:59:05 2011 GetBitmap: addled vnode index in volume
> chdata.sn; volume needs salvage
>
> Sun Jan 23 22:59:05 2011 VAttachVolume: error getting bitmap for volume
> (/vicepb//V0536871656.vol)
>
> Salvage attempt again:
>
> [atums2:~]# bos salvage atums2 /vicepb chdata.sn
>
> Starting salvage.
>
> bos: salvage completed
>
> [atums2:~]# tail /usr/afs/logs/SalvageLog
>
> @(#) OpenAFS 1.4.12 built 2010-12-13 1928681 19919656
>
> 01/23/2011 23:00:07 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager
> /vicepb 536871656)
>
> 01/23/2011 23:00:07 2 nVolumesInInodeFile 64
>
> 01/23/2011 23:00:07 CHECKING CLONED VOLUME 536871657.
>
> 01/23/2011 23:00:07 chdata.sn.readonly (536871657) updated 04/04/2007 1=
5:29
>
> 01/23/2011 23:00:07 Partially allocated vnode 2 deleted.
>
> Same result as if the prior salvage didn=92t do anything. This is exact=
ly
> what happens on other volumes I have tried to bring online.
>
> So how would I fix this? Any suggestions for how to get the rest of
> these volumes On-line?
>
> Let me know if you need further details. Thanks,
>
> Shawn
>


--=20
-----------------------------------------------------------------
Hartmut Reuter                  e-mail 		reuter@rzg.mpg.de
			   	phone 		 +49-89-3299-1328
			   	fax   		 +49-89-3299-1301
RZG (Rechenzentrum Garching)   	web    http://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------