[OpenAFS] Volumes offlined during reclone
Stephan Wonczak
a0033@rrz.uni-koeln.de
Mon, 25 May 2009 10:56:15 +0200 (CEST)
Hi all!
On two consecutive weekends we had volumes going offline on one of our
just-upgraded 1.4.10 fileservers. This seems to happen during the nightly
'vos backupsys'-runs. The effect is that the 'salvage' flag is set on one
or two volumes, effectively bringing them offline.
Here are snips from the logs
May 24 09:13:34 lvr5 volserver[6346]: VAttachVolume: volume salvage flag
is ON for /vicepba/V0536940603.vol; volume needs salvage
May 24 09:13:34 lvr5 volserver[6346]: 1 Volser: ListVolumes: Could not
attach volume 536940603 (/vicepba:V0536940603.vol), error=101
May 24 09:18:01 lvr5 volserver[6346]: VAttachVolume: volume salvage flag
is ON for /vicepba/V0536940603.vol; volume needs salvage
May 24 09:18:01 lvr5 volserver[6346]: 1 Volser: ListVolumes: Could not
attach volume 536940603 (/vicepba:V0536940603.vol), error=101
May 24 09:27:27 lvr5 volserver[6346]: VAttachVolume: volume salvage flag
is ON for /vicepba/V0536940603.vol; volume needs salvage
May 24 09:27:27 lvr5 volserver[6346]: 1 Volser: ListVolumes: Could not
attach volume 536940603 (/vicepba:V0536940603.vol), error=101
May 24 09:28:10 lvr5 volserver[6346]: VAttachVolume: volume salvage flag
is ON for /vicepba/V0536940603.vol; volume needs salvage
May 24 09:28:10 lvr5 volserver[6346]: 1 Volser: ListVolumes: Could not
attach volume 536940603 (/vicepba:V0536940603.vol), error=101
May 24 10:36:15 lvr5 volserver[6346]: VAttachVolume: volume salvage flag
is ON for /vicepba/V0537184886.vol; volume needs salvage
May 24 10:36:15 lvr5 volserver[6346]: 1 Volser: ListVolumes: Could not
attach volume 537184886 (/vicepba:V0537184886.vol), error=101
May 24 10:50:17 lvr5 volserver[6346]: VAttachVolume: volume salvage flag
is ON for /vicepba/V0537184886.vol; volume needs salvage
May 24 10:50:17 lvr5 volserver[6346]: 1 Volser: ListVolumes: Could not
attach volume 537184886 (/vicepba:V0537184886.vol), error=101
and FileLog:
May 24 07:05:01 lvr5 fileserver[6344]: Volume 537184886 now offline, must
be salvaged.
May 24 07:05:32 lvr5 last message repeated 242 times
May 24 07:05:32 lvr5 last message repeated 3 times
May 24 07:05:33 lvr5 fileserver[6344]: VAttachVolume: volume salvage flag
is ON for /vicepba//V0537184886.vol; volume needs salvage
<snip>
May 24 07:20:50 lvr5 fileserver[6344]: Volume 536940603 now offline, must
be salvaged.
May 24 07:20:50 lvr5 last message repeated 2 times
May 24 07:21:39 lvr5 fileserver[6344]: VAttachVolume: volume salvage flag
is ON for /vicepba//V0536940603.vol; volume needs salvage
<snip>
May 24 09:13:34 lvr5 fileserver[6344]: VAttachVolume: volume salvage flag
is ON for /vicepba//V0536940603.vol; volume needs salvage
May 24 09:18:01 lvr5 fileserver[6344]: VAttachVolume: volume salvage flag
is ON for /vicepba//V0536940603.vol; volume needs salvage
May 24 09:27:27 lvr5 fileserver[6344]: VAttachVolume: volume salvage flag
is ON for /vicepba//V0536940603.vol; volume needs salvage
May 24 09:28:10 lvr5 fileserver[6344]: VAttachVolume: volume salvage flag
is ON for /vicepba//V0536940603.vol; volume needs salvage
(full logs are available on request). The two volumes affected are named
'v.www.projekt' and 'c.smail.data', respectively.
What I find most surprising is the time when the volumes are set
offline, since the backupsys-run happens several hours earlier (excerpt
from BosConfig):
bnode cron backupvolwww 1
parm /usr/afs/bin/vos backupsys -prefix v.www -localauth
parm 20:00
end
There are 170 Volumes in this set and they are all finished by 20:00:56,
i.e. in less than one minute, without any errors whatsoever.
Any ideas what could be happening here?
Dipl. Chem. Dr. Stephan Wonczak
Regionales Rechenzentrum der Universitaet zu Koeln (RRZK)
Universitaet zu Koeln, Robert-Koch-Strasse 10, 50931 Koeln
Tel: +49/(0)221/478-5577, Fax: +49/(0)221/478-5590