[OpenAFS] Offline volumes after upgrade to 1.6.1pre2

Åsa Andersson spigg@csc.kth.se
Thu, 16 Feb 2012 11:15:19 +0100


Hello,

We upgraded our file servers from 1.6.0 to 1.6.1pre2 last Sunday (most of
our clients are running version 1.6.0) and after that we have seen volumes 
going offline and entries in the FileLog indicating they need salvaging. 

Typical FileLog-entries look like this:

----------
Mon Feb 13 09:38:02 2012 Fid 537126959.344.442066 has inconsistent length 
(index 573440, inode 524288); volume must be salvaged
----------

or like this:

----------
Mon Feb 13 10:10:47 2012 fssync: breaking all call backs for volume 537126959
Mon Feb 13 10:10:47 2012 ReadHeader: Failed to open volume info header file (v>olume=537126959, inode=2306942731429085183); errno=2
Mon Feb 13 10:10:47 2012 VAttachVolume: Error reading diskDataHandle header fo>r vol 537126961; error=101
Mon Feb 13 10:10:47 2012 VAttachVolume: Error attaching volume /vicepa//V05371>26961.vol; volume needs salvage; error=101
Mon Feb 13 10:10:47 2012 ReadHeader: Failed to open volume info header file (v>olume=537126959, inode=2306942731429085183); errno=2
Mon Feb 13 10:10:47 2012 VAttachVolume: Error reading diskDataHandle header fo>r vol 537126961; error=101
Mon Feb 13 10:10:47 2012 VAttachVolume: Error attaching volume /vicepa//V05371>26961.vol; volume needs salvage; error=101
Mon Feb 13 10:10:47 2012 ReadHeader: Failed to open volume info header file (volume=537126959, inode=2306942731429085183); errno=2
Mon Feb 13 10:10:47 2012 VAttachVolume: Error reading diskDataHandle header for vol 537126961; error=101
Mon Feb 13 10:10:47 2012 VAttachVolume: Error attaching volume /vicepa//V0537126961.vol; volume needs salvage; error=101
Mon Feb 13 10:10:47 2012 SYNC_getCom:  error receiving command
Mon Feb 13 10:10:47 2012 FSYNC_com:  read failed; dropping connection (cnt=68129)
----------

We have 35252 RW-volumes in total and roughly about hundred volumes have gone 
offline so far. Running salvage seems to fix the volumes.

Has anyone else seen this after upgrading to 1.6.1pre2?

Is 1.6.1pre2 detecting data corruption brought on by 1.6.0 and this is what 
we're seeing?


Åsa Andersson
School of Computer Science and Communication
Royal Institute of Technology