[OpenAFS] Re: odd problem with RW site after a botched replica

Kim Kimball kim@thekimballs.com
ue, 30 Oct 2012 08:33:10 -0600


If you have access to a recent RO the quickest fix may be to vos dump it and restore the RW from it.  NB that if there is only one RO currently available dumping it makes it busy and with no alternate the RO will be unavailable to all clients. 


> On Mon, 29 Oct 2012 12:41:09 -0700
> Timothy Balcer <timothy@telmate.com> wrote:
> 
> > > > I had made a mistake with the server directive originally, and I
> > > > attempted to correct the error midstream...  ultimately, the RO
> > > > volume seemed to release.
> > >
> > > Can you explain a little more what you mean by this?
> > >
> > 
> > I did an addsite but specified the same server as the RW volume and,
> > foolishly, tried to interrupt the process.  I ended up vos removing
> > the RO volume, but it wouldn't do it, so I did a forced zap. I then
> > did an vos addsite with the proper server directive, and it appeared
> > to go ok, and I was able to release.
> 
> You interrupted... the release, I presume? Not the addsite (an 'addsite'
> is usually very fast)
> 
> An RO can go on the same server/partition as an RW; doing that is
> recommended in almost all scenarios.
> 
> It would be helpful if you knew the error message that prevented you
> from deleting it in the first place, but I assume that is lost. I assume
> the 'proper server directive' is on another server entirely? The vldb
> information you showed only has the one RW entry, though; did the entry
> for the RO for the new server go away?
> 
> > > > However, last night the RW volume went offline, as well as the RO
> > > > volume.
> > >
> > > FileLog or VolserLog should say something around the time it went
> > > offline, which should help say why it went offline.
> > 
> > Unfortunately, it looks like I need to change the logging prefs for
> > openafs on my system, as it has wiped those out already after two
> > restarts.
> 
> Yeah, it'll do that. You can use syslog for logging, which probably
> provides more familiar logging functionality. Otherwise, it is a good
> habit to save logs as soon as something goes wrong.
> 
> > I would add in addition, a vos examine says the volume does not exist,
> > and shows only the VLDB dump... I am guessing this is because it is
> > offline?  FYI the volume file is present on /vicepb.
> 
> Well, based on what you've shown, the volume is trying to get salvaged,
> but the salvager can't bring the volume back online for some reason. So,
> it's not surprising that nothing can access the volume.
> 
> If you don't have the corresponding FileLog entries for the SalvageLog
> entries you gave, run the salvage again; if the same thing happens, show
> what it says in FileLog.
> 
> -- 
> Andrew Deason
> adeason@sinenomine.net
> 
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
> 
Kim Kimball
kim@thekimballs.com
970-215-6359

PLEASE NOTE NEW EMAIL ADDRESS:  kim@thekimballs.com