[OpenAFS] Volumes offlined during reclone

Stephan Wonczak a0033@rrz.uni-koeln.de
Mon, 25 May 2009 10:56:15 +0200 (CEST)


   Hi all!
   On two consecutive weekends we had volumes going offline on one of our 
just-upgraded 1.4.10 fileservers. This seems to happen during the nightly 
'vos backupsys'-runs. The effect is that the 'salvage' flag is set on one 
or two volumes, effectively bringing them offline.
   Here are snips from the logs

May 24 09:13:34 lvr5 volserver[6346]: VAttachVolume: volume salvage flag 
is ON for /vicepba/V0536940603.vol; volume needs salvage
May 24 09:13:34 lvr5 volserver[6346]: 1 Volser: ListVolumes: Could not 
attach volume 536940603 (/vicepba:V0536940603.vol), error=101
May 24 09:18:01 lvr5 volserver[6346]: VAttachVolume: volume salvage flag 
is ON for /vicepba/V0536940603.vol; volume needs salvage
May 24 09:18:01 lvr5 volserver[6346]: 1 Volser: ListVolumes: Could not 
attach volume 536940603 (/vicepba:V0536940603.vol), error=101
May 24 09:27:27 lvr5 volserver[6346]: VAttachVolume: volume salvage flag 
is ON for /vicepba/V0536940603.vol; volume needs salvage
May 24 09:27:27 lvr5 volserver[6346]: 1 Volser: ListVolumes: Could not 
attach volume 536940603 (/vicepba:V0536940603.vol), error=101
May 24 09:28:10 lvr5 volserver[6346]: VAttachVolume: volume salvage flag 
is ON for /vicepba/V0536940603.vol; volume needs salvage
May 24 09:28:10 lvr5 volserver[6346]: 1 Volser: ListVolumes: Could not 
attach volume 536940603 (/vicepba:V0536940603.vol), error=101
May 24 10:36:15 lvr5 volserver[6346]: VAttachVolume: volume salvage flag 
is ON for /vicepba/V0537184886.vol; volume needs salvage
May 24 10:36:15 lvr5 volserver[6346]: 1 Volser: ListVolumes: Could not 
attach volume 537184886 (/vicepba:V0537184886.vol), error=101
May 24 10:50:17 lvr5 volserver[6346]: VAttachVolume: volume salvage flag 
is ON for /vicepba/V0537184886.vol; volume needs salvage
May 24 10:50:17 lvr5 volserver[6346]: 1 Volser: ListVolumes: Could not 
attach volume 537184886 (/vicepba:V0537184886.vol), error=101

and FileLog:
May 24 07:05:01 lvr5 fileserver[6344]: Volume 537184886 now offline, must 
be salvaged.
May 24 07:05:32 lvr5 last message repeated 242 times
May 24 07:05:32 lvr5 last message repeated 3 times
May 24 07:05:33 lvr5 fileserver[6344]: VAttachVolume: volume salvage flag 
is ON for /vicepba//V0537184886.vol; volume needs salvage
<snip>
May 24 07:20:50 lvr5 fileserver[6344]: Volume 536940603 now offline, must 
be salvaged.
May 24 07:20:50 lvr5 last message repeated 2 times
May 24 07:21:39 lvr5 fileserver[6344]: VAttachVolume: volume salvage flag 
is ON for /vicepba//V0536940603.vol; volume needs salvage
<snip>
May 24 09:13:34 lvr5 fileserver[6344]: VAttachVolume: volume salvage flag 
is ON for /vicepba//V0536940603.vol; volume needs salvage
May 24 09:18:01 lvr5 fileserver[6344]: VAttachVolume: volume salvage flag 
is ON for /vicepba//V0536940603.vol; volume needs salvage
May 24 09:27:27 lvr5 fileserver[6344]: VAttachVolume: volume salvage flag 
is ON for /vicepba//V0536940603.vol; volume needs salvage
May 24 09:28:10 lvr5 fileserver[6344]: VAttachVolume: volume salvage flag 
is ON for /vicepba//V0536940603.vol; volume needs salvage

(full logs are available on request). The two volumes affected are named 
'v.www.projekt' and 'c.smail.data', respectively.
   What I find most surprising is the time when the volumes are set 
offline, since the backupsys-run happens several hours earlier (excerpt 
from BosConfig):

bnode cron backupvolwww 1
parm /usr/afs/bin/vos backupsys -prefix v.www -localauth
parm 20:00
end

   There are 170 Volumes in this set and they are all finished by 20:00:56, 
i.e. in less than one minute, without any errors whatsoever.
   Any ideas what could be happening here?


 	Dipl. Chem. Dr. Stephan Wonczak

         Regionales Rechenzentrum der Universitaet zu Koeln (RRZK)
         Universitaet zu Koeln, Robert-Koch-Strasse 10, 50931 Koeln
         Tel: +49/(0)221/478-5577, Fax: +49/(0)221/478-5590