[OpenAFS] CopyOnWrite failure, leading to volume salvage
Dameon Wagner
dameon.wagner@it.ox.ac.uk
Thu, 27 Sep 2012 10:45:45 +0100
Hi,
We have a modest AFS installation which is (currently at least) mostly
for internal use. A while ago we had an issue where an AFS volume
hosting data for our VLE (virtual learning environment) could not be
read, leading to attachment errors, and ultimately to salvaging the
volume.
I've trawled through google and the archives here, and found a few
issues mentioned that are close, but don't exactly match with our
observations, so I thought it might be worth posting some details,
just in case someone can suggest anything that might prevent this
happening again in the future, or at least reduce the chances of it
recurring.
We are currently running our AFS infrastructure on Debian Lenny, with
installed packages based on version 1.4.7.dfsg1-6+lenny4, so it's
possible that we're seeing a bug fixed in newer releases (an upgrade
to our AFS infrastructure is planned for the future).
So, here are some details from our recent issue. After being alerted
by a member of our VLE team I first noticed that running `ls -l` on
the directory hosting the volume returned:
#---8<-----------------------------------------------------------------
d????????? ? ? ? ? ? ? ? files/
drwxrwxrwx 7 root root 2048 Mar 12 2010 logs/
#---8<-----------------------------------------------------------------
where files is the volume in question ("logs" is on another volume).
Additionally, `vos listvol` stated:
#---8<-----------------------------------------------------------------
**** Could not attach volume 536874907 ****
#---8<-----------------------------------------------------------------
along with a few other VolIDs that later matched clones of 536874907
that I think `vos dump` had created during scheduled backup attempts.
Digging into the logfiles I found lines like the following in
VolserLog and FileLog:
#---8<-----------------------------------------------------------------
# VolserLog:
VAttachVolume: Error attaching volume /vicepa/V0536874907.vol; volume needs salvage; error=101
# FileLog:
VAttachVolume: Error reading namei vol header /vicepa//V0536874907.vol; error=101
VAttachVolume: Error attaching volume /vicepa//V0536874907.vol; volume needs salvage; error=101
#---8<-----------------------------------------------------------------
The FileLog lines especially were repeated extensively. Reading
further back in FileLog it seems that the issue began after the
following entries:
#---8<-----------------------------------------------------------------
CopyOnWrite failed: Partition /vicepa that contains volume 536874907 may be out of free inodes(errno = 2)
Alloc_NewVnode: partition /vicepa idec 9492608003410095 failed
Volume : 536874907 vnode = 592047 Failed to create inode: errno = 2
#---8<-----------------------------------------------------------------
Troubleshooting showed that there was no shortage of inodes, so that
wasn't the underlying issue (I read it as only a likely suggestion, or
one possible cause).
In the end, `bos salvage` with "-volume 536874907" fixed everything,
with no known loss or corruption of data. For the record, SalvageLog
contained many lines (just over a 1000) like the following first
three:
#---8<-----------------------------------------------------------------
Vnode 16928: version < inode version; fixed (old status)
Vnode 23352: version < inode version; fixed (old status)
Vnode 53568: version < inode version; fixed (old status)
... ending with
totalInodes 1690748
Salvaged vhost.a071 (536874907): 939204 files, 361363522 blocks
#---8<-----------------------------------------------------------------
Does any of the above ring any bells? Any suggestions would be
gratefully received, even if they're along the lines of "This was
fixed in 1.[46].xx, so it'll all be OK after you've upgraded to
something more recent".
Thanks in advance.
Dameon.
--
><> ><> ><> ><> ><> ><> ooOoo <>< <>< <>< <>< <>< <><
Dameon Wagner, Systems Development and Support Team
IT Services, University of Oxford
><> ><> ><> ><> ><> ><> ooOoo <>< <>< <>< <>< <>< <><