[OpenAFS] Problem with Off-line volumes...unable to bring On-line

McKee, Shawn smckee@umich.edu
Mon, 24 Jan 2011 09:26:02 -0500


------=_NextPart_000_00C9_01CBBBA8.B7D54BA0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Thanks for the information Hartmut.

I tried setting ulimit to 1000000 blocks and rerunning the salvage.  I =
still
got no core file (salvager "seemed" to complete):

[atums2:~]# ulimit -a
core file size          (blocks, -c) 1000000
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 49152
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 49152
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
[atums2:~]# bos salvage atums2 /vicepb chdata.sn
Starting salvage.
bos: salvage completed

SalvageLog file shows the same thing as before.

Then I tried running 'gdb' and got:

[atums2:~]# gdb /usr/afs/bin/salvager
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show =
copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/afs/bin/salvager...(no debugging symbols
found)...done.
(gdb) run /vicepb 536871656 -debug
Starting program: /usr/afs/bin/salvager /vicepb 536871656 -debug
warning: no loadable sections found in added symbol-file system-supplied =
DSO
at 0x2aaaaaaab000
Mon Jan 24 15:16:47 2011 Assertion failed! file vol-salvage.c, line =
2859.

Program received signal SIGABRT, Aborted.
0x0000003408c30265 in raise () from /lib64/libc.so.6

The log file then showed:

[atums2:~]# tail /usr/afs/logs/SalvageLog
@(#) OpenAFS 1.4.12 built  2010-12-13 1928681 19919656
01/24/2011 15:16:47 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager =
/vicepb
536871656 -debug)
01/24/2011 15:16:47 2 nVolumesInInodeFile 64
01/24/2011 15:16:47 CHECKING CLONED VOLUME 536871657.
01/24/2011 15:16:47 chdata.sn.readonly (536871657) updated 04/04/2007 =
15:29
01/24/2011 15:16:47 Partially allocated vnode 2 deleted.

So I assume that I need to dig into vol-salvage.c around line 2859 to =
figure
out why the Assertion failed? =20

I should also note that the rest of the AFS cell is running the "SL" =
version
of OpenAFS, rather than "SLC" like this node.  One possibility is that I
could switch to those RPMS since I have hit issues in the past with CERN
customizations for OpenAFS.

Thanks,

Shawn

-----Original Message-----
From: Hartmut Reuter [mailto:reuter@rzg.mpg.de]=20
Sent: Monday, January 24, 2011 9:04 AM
To: McKee, Shawn
Cc: openafs-info@openafs.org
Subject: Re: [OpenAFS] Problem with Off-line volumes...unable to bring
On-line

Looks like a crash of the salvager. The SalvageLog should end =
differently
with=20
the summary line for the RW-volume. Are there any core files in
/usr/afs/logs?=20
If not, make sure ulimit for core file size isn't set to 0 and retry.

You also could run the salvager by hand under gdb to see why it crashes. =
You

need then to add the -debug flag to prevent it from forking. E.g.

gdb /usr/afs/bin/salvager
...
(gdb) run /vicepb 536871656 -debug


Good luck,
Hartmut

McKee, Shawn wrote:
> Hi Everyone,
>
> I am having a problem with one of my OpenAFS file servers. About =BD =
of
> the volumes are =93Off-line=94 and I am unable to bring them online. =
First
> some system info and then I will list problem details and what I have
tried.
>
> The system is running Scientific Linux 5.5/x86_64 (basically CentOS =
5.5
> 64-bit). The openafs rpms are:
>
> [atums2:~]# rpm -qa | grep openafs
>
> openafs-kpasswd-1.4.12-6.cern
>
> openafs-client-1.4.12-6.cern
>
> kernel-module-openafs-2.6.18-194.3.1.el5-1.4.12-5.cern
>
> openafs-1.4.12-6.cern
>
> kernel-module-openafs-2.6.18-194.8.1.el5-1.4.12-5.cern
>
> openafs-krb5-1.4.12-6.cern
>
> kernel-module-openafs-2.6.18-238.1.1.el5-1.4.12-6.cern
>
> openafs-server-1.4.12-6.cern
>
> The version of =91e2fsprogs=92 is 1.39
>
> The system has an ext3 1TB partition for AFS:
>
> [atums2:~]# df /vicepb
>
> Filesystem 1K-blocks Used Available Use% Mounted on
>
> /dev/sda1 1007931664 635382472 321349196 67% /vicepb
>
> The system has 931 volumes and only 470 are On-line while 461 are
Off-line:
>
> [atums2:~]# vos listvol atums2
>
> Total number of volumes on server atums2 partition /vicepb: 931
>
> chamber.OLD_eml4a07 536872814 RW 8634169 K Off-line
>
> chamber.OLD_eml4a07.readonly 536872815 RO 8634169 K On-line
>
> chamber.OLD_eml4a09 536872817 RW 702642 K Off-line
>
> chamber.OLD_eml4a09.readonly 536872818 RO 702642 K On-line
>
> =85
>
> Total volumes onLine 470 ; Total volumes offLine 461 ; Total busy 0
>
> I have run =91bos salvage=92 on the partition multiple times. I have
> restarted the system. I have run a force fsck.ext3 check on the
> underlying partition (no problems found). Only RW volumes are =
Off-line.
> All RO volumes are On-line. There are a few RW volumes On-line (8 out =
of
> 469) but the rest won=92t come On-line.
>
> Here is a particular volume which is Off-line:
>
> [atums2:~]# vos examine chdata.sn
>
> chdata.sn 536871656 RW 598 K Off-line
>
> atums2.cern.ch /vicepb
>
> RWrite 536871656 ROnly 0 Backup 0
>
> MaxQuota 10000000 K
>
> Creation Fri May 26 04:02:49 2006
>
> Copy Wed Oct 11 12:35:42 2006
>
> Backup Sun Jun 11 00:30:10 2006
>
> Last Access Fri Jan 7 16:38:32 2011
>
> Last Update Wed Apr 4 15:29:42 2007
>
> 0 accesses in the past day (i.e., vnode references)
>
> RWrite: 536871656 ROnly: 536871657 RClone: 536871657
>
> number of sites -> 3
>
> server atums1.cern.ch partition /vicepi RO Site -- Old release
>
> server atums2.cern.ch partition /vicepb RW Site -- New release
>
> server atums2.cern.ch partition /vicepb RO Site -- New release
>
> Try to bring online:
>
> [atums2:~]# vos online -server atums2 -partition /vicepb -id chdata.sn
>
> The FileLog shows:
>
> Sun Jan 23 22:57:03 2011 GetBitmap: addled vnode index in volume
> chdata.sn; volume needs salvage
>
> Sun Jan 23 22:57:03 2011 VAttachVolume: error getting bitmap for =
volume
> (/vicepb//V0536871656.vol)
>
> Try to Salvage:
>
> [atums2:~]# bos salvage atums2 /vicepb chdata.sn
>
> Starting salvage.
>
> bos: salvage completed
>
> The SalvageLog shows:
>
> [atums2:~]# tail /usr/afs/logs/SalvageLog
>
> @(#) OpenAFS 1.4.12 built 2010-12-13 1928681 19919656
>
> 01/23/2011 22:58:19 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager
> /vicepb 536871656)
>
> 01/23/2011 22:58:19 2 nVolumesInInodeFile 64
>
> 01/23/2011 22:58:19 CHECKING CLONED VOLUME 536871657.
>
> 01/23/2011 22:58:19 chdata.sn.readonly (536871657) updated 04/04/2007
15:29
>
> 01/23/2011 22:58:19 Partially allocated vnode 2 deleted.
>
> Try again:
>
> [atums2:~]# vos online -server atums2 -partition /vicepb -id chdata.sn
>
>
> FileLog has the same message:
>
> Sun Jan 23 22:59:05 2011 GetBitmap: addled vnode index in volume
> chdata.sn; volume needs salvage
>
> Sun Jan 23 22:59:05 2011 VAttachVolume: error getting bitmap for =
volume
> (/vicepb//V0536871656.vol)
>
> Salvage attempt again:
>
> [atums2:~]# bos salvage atums2 /vicepb chdata.sn
>
> Starting salvage.
>
> bos: salvage completed
>
> [atums2:~]# tail /usr/afs/logs/SalvageLog
>
> @(#) OpenAFS 1.4.12 built 2010-12-13 1928681 19919656
>
> 01/23/2011 23:00:07 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager
> /vicepb 536871656)
>
> 01/23/2011 23:00:07 2 nVolumesInInodeFile 64
>
> 01/23/2011 23:00:07 CHECKING CLONED VOLUME 536871657.
>
> 01/23/2011 23:00:07 chdata.sn.readonly (536871657) updated 04/04/2007
15:29
>
> 01/23/2011 23:00:07 Partially allocated vnode 2 deleted.
>
> Same result as if the prior salvage didn=92t do anything. This is =
exactly
> what happens on other volumes I have tried to bring online.
>
> So how would I fix this? Any suggestions for how to get the rest of
> these volumes On-line?
>
> Let me know if you need further details. Thanks,
>
> Shawn
>


--=20
-----------------------------------------------------------------
Hartmut Reuter                  e-mail 		reuter@rzg.mpg.de
			   	phone 		 +49-89-3299-1328
			   	fax   		 +49-89-3299-1301
RZG (Rechenzentrum Garching)   	web    http://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------

------=_NextPart_000_00C9_01CBBBA8.B7D54BA0
Content-Type: application/x-pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIMIzCCA/gw
ggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEGCgmSJomT8ixkARkWA25ldDESMBAGCgmS
JomT8ixkARkWAkVTMQ4wDAYDVQQKEwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9y
aXRpZXMxGDAWBgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAxMjUw
ODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEg
MB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEw
ggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKC
FciWcfe8Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM4VGTSFdL
VG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8ljV5OSWa/mfsCACyS5zFIWu0
yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4Vd
Gy0eIIPw1pfvYwxO36rm0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3
rz9LAgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIAhzAdBgNVHQ4E
FgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAUvF1NSC/4NZRZq1yJSz7RsjoUAeow
DwYDVR0TAQH/BAUwAwEB/zAlBgNVHREEHjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzAN
BgkqhkiG9w0BAQUFAAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm
ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZFAycbIUEOJDBHR4v
tQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5LsBJq3PmuubeMcc7mbQAfJZ7h/3Qghgk
FIhmE1+LBXPJbkuP8vgfg6h2BKoAf5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0
gqNVPm392UchXGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBAAwggLooAMC
AQICASEwDQYJKoZIhvcNAQEFBQAwdTETMBEGCgmSJomT8ixkARkWA25ldDESMBAGCgmSJomT8ixk
ARkWAkVTMQ4wDAYDVQQKEwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMx
GDAWBgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEwMDgwNzAwMDBaFw0yMjEwMjYwNzAwMDBa
MHUxEzARBgoJkiaJk/IsZAEZFgNuZXQxEjAQBgoJkiaJk/IsZAEZFgJFUzEOMAwGA1UEChMFRVNu
ZXQxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0aWVzMRgwFgYDVQQDEw9FU25ldCBSb290
IENBIDEwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCyhT19UCvOoOts/yl9KUQLUZ0v
FV5qy1A2hTMGcihHrnlsZK8psr57g6OIQgz6yHxMyHAFpychNXr+GlsN5d0Ju+Rg1IsleNNTqeff
Ij0+W2wjJRyXGAQDcETSp77WfA3KFPQWu+balEB1dU0cejZMNQEOYDTEErmecArA0RBrPgXtyUWt
OkSX/JOYMOaHagYwk5BpiMAnMdahLixlznEKDPshCmbw+POTIjymhCmCQUQx4tIfr749Y/IHvUif
CcujC+AftkDS2+3040rrTBeahiSpQoB8OnTSx7ltoJUtSmdw7dYl5DuknaMHJbCr6RYVvXwyH7bv
0znDCziYuX4FAgMBAAGjgZowgZcwEQYJYIZIAYb4QgEBBAQDAgAHMCEGA1UdEQQaMBiBFkVTbmV0
LVJvb3QtQ0EtMUBlcy5uZXQwDwYDVR0TAQH/BAUwAwEB/zAdBgNVHQ4EFgQUvF1NSC/4NZRZq1yJ
Sz7RsjoUAeowHwYDVR0jBBgwFoAUvF1NSC/4NZRZq1yJSz7RsjoUAeowDgYDVR0PAQH/BAQDAgGG
MA0GCSqGSIb3DQEBBQUAA4IBAQAOxsv4RoupyosJtnwF70DVOznvOlRs/dlTyAsYwsggjFt7mLhQ
CVU4wesETFB/38NNSPqQvQmfkVCIcVqhSiYs/ln945Kvtuaz0NlXuRU3rAf7q1J3MKmSkWjpOUrU
luV0IwiHypO1uqugSZN5zAJuu27UqEXRPTX7HnxKR2ySfnbY0YMicfkF+7Dn7SLKhzFphI+0EwI6
Aib60k12rxyqLEObPmj8km2ekejAgc07FMyZNX20768yZ84tLB+Hsmde1sUY/AcZ8jegRzP9IHq9
m/3Or6361uu/yegNHmwpW3bpxXbGMT7XB+PcqLoE6sihFRHh+4YVZDh9Gxn+ENFXMIIEHzCCAweg
AwIBAgIDAK3kMA0GCSqGSIb3DQEBBQUAMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJ
k/IsZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNV
BAMTDURPRUdyaWRzIENBIDEwHhcNMTAwNTEwMTkzODA1WhcNMTEwNTEwMTkzODA1WjBcMRMwEQYK
CZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIZG9lZ3JpZHMxDzANBgNVBAsTBlBlb3Bs
ZTEaMBgGA1UEAxMRU2hhd24gTWNLZWUgODM0NjcwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEK
AoIBAQC64ipPSwPcHoJMXNcMD/dUglGl7Zv6i4lnRMjgwj4Ln5fkPtUcC+2EU1IFcNUi5Fynbbh+
tHUyGlclPjVZubsxQXuYeAPFhiyu12uqu0x0YOjnU7u7U+8oS0sEdPOy+9HJejGxxWkIrOVA54yN
0I9+fO51/p2NgBJ0zI9qfH0agwGuS+9Sfx8htfWn24kTNqQ1SQZRe8wKfBZOaK0e0ZnmZ5DamagV
1lzo07qI1yFuAYqOwd0worVVwYpA840epRF+jWZsAgQE9VwPc7RWY9ML6NyjgncnEdPXWVH8niLB
0LqCecO0KIpBwYHevj2B4b8zk3IFqAx4Qer3DkCOSUVLAgMBAAGjgdwwgdkwEQYJYIZIAYb4QgEB
BAQDAgXgMA4GA1UdDwEB/wQEAwIE8DA2BgNVHSAELzAtMA0GCyqGSIb3TAMHAQMBMAwGCiqGSIb3
TAUCAgEwDgYMKoZIhvdMBQIDAgEBMD4GA1UdHwQ3MDUwM6AxoC+GLWh0dHA6Ly9jcmwuZG9lZ3Jp
ZHMub3JnLzFjM2YyY2E4LzFjM2YyY2E4LmNybDAbBgNVHREEFDASgRBzbWNrZWVAdW1pY2guZWR1
MB8GA1UdIwQYMBaAFMoZHRKObqQ4XULUMQ4I29mNFw1dMA0GCSqGSIb3DQEBBQUAA4IBAQBlyn/k
dOmigE+bXjwXcAUTqSDRuNDCGzBJ6qbSG6GqpByyXFbdfwq4/XAm+vvvQD0iZh1MGZ/iD5xycD5m
kVtFI5T9C+SfhcE+y6VriJOVVvZY5dB9Dpyzhv98ElZQKH/nb2y11CjZ0ELJSg2GLT4fJlT1OW6m
HOw/fGNMr97dK8Y2UdCCx+uW1RXYhAIzV3zFCoudHMR83kH5ee+Pd6TSpc/xBVQxN/zzNCw0YjST
veX3sfLOs0c/A+knug7YNwg7bInoAboE1rP6cARanWoQif9vCGOMXN9r8F1LEJTfozrlmfZPbNg3
9cLMHtrREyb1ARuZKWrjt9IE1N+GIAWPMYIDtzCCA7MCAQEwcDBpMRMwEQYKCZImiZPyLGQBGRYD
b3JnMRgwFgYKCZImiZPyLGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhv
cml0aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgMAreQwCQYFKw4DAhoFAKCCAhwwGAYJKoZI
hvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTEwMTI0MTQyNjAyWjAjBgkqhkiG
9w0BCQQxFgQUvVu5dOMDCqpbrsfSepUOLdV7DsswfwYJKwYBBAGCNxAEMXIwcDBpMRMwEQYKCZIm
iZPyLGQBGRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmlj
YXRlIEF1dGhvcml0aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgMAreQwgYEGCyqGSIb3DQEJ
EAILMXKgcDBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIRE9FR3JpZHMx
IDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAx
AgMAreQwgbcGCSqGSIb3DQEJDzGBqTCBpjALBglghkgBZQMEASowCwYJYIZIAWUDBAEWMAoGCCqG
SIb3DQMHMAsGCWCGSAFlAwQBAjAOBggqhkiG9w0DAgICAIAwBwYFKw4DAgcwDQYIKoZIhvcNAwIC
AUAwDQYIKoZIhvcNAwICASgwBwYFKw4DAhowCwYJYIZIAWUDBAIDMAsGCWCGSAFlAwQCAjALBglg
hkgBZQMEAgEwCgYIKoZIhvcNAgUwDQYJKoZIhvcNAQEBBQAEggEARSd21ewwDl7+KVXtEd6y0Qrb
ns5MmwdMPJ57F1cCOht3Brswa9ZHAsxyFlvqc7WfAm/yfyOo9RiykRk1RCqLSOBGB203B3QWsJGu
siLR3lrNe6T5cEo0ALvfZBTXqVCyC+woeiXUxUs31EWYqhvt/ySgx0OT0RemH0zzMklToNa2XnpS
DLJdQUjxQJX7JPZVL50y4Ziq+K6uNhbCUxKuAfjEWKPoNMSu2FzxVApzYKujk2UmdOSPokaGW+/T
d0R+bH8+5alyjuByA5rd1Ox9KvBBymd6oG3dd4OW9eYf0CQ+Zxu4SVH6qNJMbNrdAFuG/mhIa9lj
V4VWeJxLr7lmNAAAAAAAAA==

------=_NextPart_000_00C9_01CBBBA8.B7D54BA0--