[OpenAFS] Volume won't release - needs salvaging.

Richard Brittain Richard.Brittain@dartmouth.edu
Mon, 27 Sep 2010 12:26:59 -0400 (EDT)


Hi,
    I found a discussion of what seems like exactly this problem back in 
2003, but that was several versions of OpenAFS ago, and on a different 
server OS and filesystem.

I basically had a replicated volume that I could not release, which I 
eventually fixed up by copying and renaming, but the process was arduous.
I'm wondering if this is a known problem, and how I might better have 
handled it.  I can't do any more tests on it, but I still have the logs.
If 'vos copy' hadn't worked, I'd still be stuck.  The volume contents were 
apparently still looking OK even when it was corrupt -- the user didn't 
report a problem.

OpenAFS 1.4.12, Linux (RHEL5) server, ext3 filesystem for /vicep*

- vos release complains that the volume needs salvage - refuses to release

- bos salvage the volume - minor corruption apparently fixed

- vos release still complains

- repeated bos salvage indicates no more problems.  The data are 
accessible via clients and everything looks fine, although it is big 
(130GB, 115,000 files) so we didn't examine all that much

- I _can_ 'vos copy' to a new name, apparently without a problem.  The 
copy can be replicated with no error.

- vos remove the RW, leave the RO (different server)

- vos convertROtoRW -- fails with 'code 5 -   I/O error'

- vos remove volname.readonly  -- fails, volume needs to be salvaged
      (This surprised me - I'm trying to delete it, but it wants to be
       fixed first).

- vos zap volname.readonly - fails, volume needs to be salvaged

- vos zap -force  - OK, volume finally gone, but still in VLDB

- vos syncvldb (I though that the direction I needed to go, but it changed 
nothing)

- vos syncserv - got rid of the volume from the VLDB

- vos rename (copy back to original)

- remove and recreate the mount point (not sure if that was needed), and 
eventually my clients are happy and the data are visible again

- vos addsite and release, and I'm back where I should be, 4 hours later.

Richard

-- 
Richard Brittain,  Research Computing Group,
                    Kiewit Computing Services, 6224 Baker/Berry Library
                    Dartmouth College, Hanover NH 03755
Richard.Brittain@dartmouth.edu 6-2085