[OpenAFS] Re: odd problem with RW site after a botched replica

Andrew Deason adeason@sinenomine.net
Mon, 29 Oct 2012 17:45:41 -0500


On Mon, 29 Oct 2012 12:41:09 -0700
Timothy Balcer <timothy@telmate.com> wrote:

> > > I had made a mistake with the server directive originally, and I
> > > attempted to correct the error midstream...  ultimately, the RO
> > > volume seemed to release.
> >
> > Can you explain a little more what you mean by this?
> >
> 
> I did an addsite but specified the same server as the RW volume and,
> foolishly, tried to interrupt the process.  I ended up vos removing
> the RO volume, but it wouldn't do it, so I did a forced zap. I then
> did an vos addsite with the proper server directive, and it appeared
> to go ok, and I was able to release.

You interrupted... the release, I presume? Not the addsite (an 'addsite'
is usually very fast)

An RO can go on the same server/partition as an RW; doing that is
recommended in almost all scenarios.

It would be helpful if you knew the error message that prevented you
from deleting it in the first place, but I assume that is lost. I assume
the 'proper server directive' is on another server entirely? The vldb
information you showed only has the one RW entry, though; did the entry
for the RO for the new server go away?

> > > However, last night the RW volume went offline, as well as the RO
> > > volume.
> >
> > FileLog or VolserLog should say something around the time it went
> > offline, which should help say why it went offline.
> 
> Unfortunately, it looks like I need to change the logging prefs for
> openafs on my system, as it has wiped those out already after two
> restarts.

Yeah, it'll do that. You can use syslog for logging, which probably
provides more familiar logging functionality. Otherwise, it is a good
habit to save logs as soon as something goes wrong.

> I would add in addition, a vos examine says the volume does not exist,
> and shows only the VLDB dump... I am guessing this is because it is
> offline?  FYI the volume file is present on /vicepb.

Well, based on what you've shown, the volume is trying to get salvaged,
but the salvager can't bring the volume back online for some reason. So,
it's not surprising that nothing can access the volume.

If you don't have the corresponding FileLog entries for the SalvageLog
entries you gave, run the salvage again; if the same thing happens, show
what it says in FileLog.

-- 
Andrew Deason
adeason@sinenomine.net