[OpenAFS] Re: odd problem with RW site after a botched replica
Timothy Balcer
timothy@telmate.com
Tue, 30 Oct 2012 12:46:02 -0700
--14dae9cfce523bf4c204cd4c09b7
Content-Type: text/plain; charset=ISO-8859-1
> > I did an addsite but specified the same server as the RW volume and,
> > foolishly, tried to interrupt the process. I ended up vos removing
> > the RO volume, but it wouldn't do it, so I did a forced zap. I then
> > did an vos addsite with the proper server directive, and it appeared
> > to go ok, and I was able to release.
>
> You interrupted... the release, I presume? Not the addsite (an 'addsite'
> is usually very fast)
>
Correct.
>
> An RO can go on the same server/partition as an RW; doing that is
> recommended in almost all scenarios.
>
> It would be helpful if you knew the error message that prevented you
> from deleting it in the first place, but I assume that is lost. I assume
> the 'proper server directive' is on another server entirely? The vldb
> information you showed only has the one RW entry, though; did the entry
> for the RO for the new server go away?
>
Correct. The replica was on another fileserver, but that replica has been
removed and deleted from the VLDB and from the remote fileserver, and the
fileserver processes restarted.
Yeah, it'll do that. You can use syslog for logging, which probably
> provides more familiar logging functionality. Otherwise, it is a good
> habit to save logs as soon as something goes wrong.
>
Thanks much for that. I'll set that up as a norm.
Well, based on what you've shown, the volume is trying to get salvaged,
> but the salvager can't bring the volume back online for some reason. So,
> it's not surprising that nothing can access the volume.
>
Right.. the salvage completes just fine, but it won't bring it back online.
Nor will vos online do so. But the data and metadata is still there.
>
> If you don't have the corresponding FileLog entries for the SalvageLog
> entries you gave, run the salvage again; if the same thing happens, show
> what it says in FileLog.
>
Doing that now. May I ask.. is there a generic procedure to deal with
"renewing" a volume in the VLDB from the data set on the partition? That is
to say, can you remove a volume from the VLDB and regenerate it in the
VLDB from what is on disk alone?
Thanks again for all your help. :)
>
> --
> Andrew Deason
> adeason@sinenomine.net
>
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>
--
Timothy Balcer / IT Services
Telmate / San Francisco, CA
Direct / (415) 300-4313
Customer Service / (800) 205-5510
--14dae9cfce523bf4c204cd4c09b7
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
<br><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"m=
argin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class=3D=
"im">
> I did an addsite but specified the same server as the RW volume and,<b=
r>
> foolishly, tried to interrupt the process. =A0I ended up vos removing<=
br>
> the RO volume, but it wouldn't do it, so I did a forced zap. I the=
n<br>
> did an vos addsite with the proper server directive, and it appeared<b=
r>
> to go ok, and I was able to release.<br>
<br>
</div>You interrupted... the release, I presume? Not the addsite (an 'a=
ddsite'<br>
is usually very fast)<br></blockquote><div><br>Correct.<br>=A0<br></div><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #=
ccc solid;padding-left:1ex">
<br>
An RO can go on the same server/partition as an RW; doing that is<br>
recommended in almost all scenarios.<br>
<br>
It would be helpful if you knew the error message that prevented you<br>
from deleting it in the first place, but I assume that is lost. I assume<br=
>
the 'proper server directive' is on another server entirely? The vl=
db<br>
information you showed only has the one RW entry, though; did the entry<br>
for the RO for the new server go away?<br></blockquote><div class=3D"im">Co=
rrect. The replica was on another fileserver, but that replica has been rem=
oved and deleted from the VLDB and from the remote fileserver, and the file=
server processes restarted.<br>
<br>
</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l=
eft:1px #ccc solid;padding-left:1ex">Yeah, it'll do that. You can use s=
yslog for logging, which probably<br>
provides more familiar logging functionality. Otherwise, it is a good<br>
habit to save logs as soon as something goes wrong.<br></blockquote><div cl=
ass=3D"im"><br>Thanks much for that. I'll set that up as a norm. <br>
<br>
</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l=
eft:1px #ccc solid;padding-left:1ex">
Well, based on what you've shown, the volume is trying to get salvaged,=
<br>
but the salvager can't bring the volume back online for some reason. So=
,<br>
it's not surprising that nothing can access the volume.<br></blockquote=
><div><br>Right.. the salvage completes just fine, but it won't bring i=
t back online. Nor will vos online do so. But the data and metadata is stil=
l there. <br>
</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l=
eft:1px #ccc solid;padding-left:1ex">
<br>
If you don't have the corresponding FileLog entries for the SalvageLog<=
br>
entries you gave, run the salvage again; if the same thing happens, show<br=
>
what it says in FileLog.<br></blockquote><div><br>Doing that now. May I ask=
.. is there a generic procedure to deal with "renewing" a volume =
in the VLDB from the data set on the partition? That is to say,=A0 can you =
remove a volume from the VLDB and regenerate it in the VLDB from what is on=
disk alone?<br>
<br>Thanks again for all your help. :)<br></div><blockquote class=3D"gmail_=
quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1=
ex">
<div class=3D"HOEnZb"><div class=3D"h5"><br>
--<br>
Andrew Deason<br>
<a href=3D"mailto:adeason@sinenomine.net">adeason@sinenomine.net</a><br>
<br>
_______________________________________________<br>
OpenAFS-info mailing list<br>
<a href=3D"mailto:OpenAFS-info@openafs.org">OpenAFS-info@openafs.org</a><br=
>
<a href=3D"https://lists.openafs.org/mailman/listinfo/openafs-info" target=
=3D"_blank">https://lists.openafs.org/mailman/listinfo/openafs-info</a><br>
</div></div></blockquote></div><br><br clear=3D"all"><br>-- <br><span style=
=3D"border-collapse:collapse;color:rgb(102,102,102);font-family:verdana,san=
s-serif;font-size:x-small">Timothy Balcer / IT Services<br>Telmate / San Fr=
ancisco, CA<br>
Direct / </span><span style=3D"border-collapse:collapse;font-family:verdana=
,sans-serif;font-size:x-small"><font color=3D"#1155cc">(415) 300-4313</font=
><br><font color=3D"#666666">Customer Service /=A0</font><a value=3D"+18002=
055510" style=3D"color:rgb(17,85,204)">(800) 205-5510</a></span><br>
--14dae9cfce523bf4c204cd4c09b7--