[OpenAFS] Re: odd problem with RW site after a botched replica

Timothy Balcer timothy@telmate.com
Tue, 30 Oct 2012 20:07:57 -0700


--14dae9cfcca499bc4904cd523538
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Oct 30, 2012 at 7:33 AM, Kim Kimball <kim@thekimballs.com> wrote:

> If you have access to a recent RO the quickest fix may be to vos dump it
> and restore the RW from it.  NB that if there is only one RO currently
> available dumping it makes it busy and with no alternate the RO will be
> unavailable to all clients.
>
>
Thanks for that Tip, however in my efforts to get the RW site functioning,
I removed the RO replica.

In other news, the latest salvage has been running for 12 hours... I
straced the busiest pid and it is happily verifying all the links and
contents (open(), close(), pread() ad infinitum), so its not wedged. This
volume has literally slightly less than 32k directory entries in various
places (yes, I made SURE the limits were observed ;-) ) and so I imagine it
will take a very long time to traverse the entire thing... interesting that
this is the fourth salvage and it actually seems to be working at it this
time. Last three times it stopped after a bit over an hour.

I suspect that the resources given to the afs server were too limited to
actually get the salvage done properly. One thing I did this time was
increase the memory to the server up to 8GB, and free shows it tooling
merrily along with plenty of buffers and cache now.

I did THAT because I noticed that the kernel killed the salvage operation
the first two times due to out of memory conditions.. something I had not
checked, or expected. So it may be that this is the second "true" salvage,
and it may succeed.

I'll keep you all posted. There wasn't an error in the AFS logs that
indicated that salvager proceses had been killed due to OOM. It was only in
the kernel logs.

-- 
Timothy Balcer

--14dae9cfcca499bc4904cd523538
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<br><br><div class=3D"gmail_quote">On Tue, Oct 30, 2012 at 7:33 AM, Kim Kim=
ball <span dir=3D"ltr">&lt;<a href=3D"mailto:kim@thekimballs.com" target=3D=
"_blank">kim@thekimballs.com</a>&gt;</span> wrote:<br><blockquote class=3D"=
gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-=
left:1ex">
If you have access to a recent RO the quickest fix may be to vos dump it an=
d restore the RW from it. =A0NB that if there is only one RO currently avai=
lable dumping it makes it busy and with no alternate the RO will be unavail=
able to all clients.<br>

<div class=3D"im"><br></div></blockquote><div><br>Thanks for that Tip, howe=
ver in my efforts to get the RW site functioning, I removed the RO replica.=
<br><br>In other news, the latest salvage has been running for 12 hours... =
I straced the busiest pid and it is happily verifying all the links and con=
tents (open(), close(), pread() ad infinitum), so its not wedged. This volu=
me has literally slightly less than 32k directory entries in various places=
 (yes, I made SURE the limits were observed ;-) ) and so I imagine it will =
take a very long time to traverse the entire thing... interesting that this=
 is the fourth salvage and it actually seems to be working at it this time.=
 Last three times it stopped after a bit over an hour.<br>
<br>I suspect that the resources given to the afs server were too limited t=
o actually get the salvage done properly. One thing I did this time was inc=
rease the memory to the server up to 8GB, and free shows it tooling merrily=
 along with plenty of buffers and cache now.<br>
<br>I did THAT because I noticed that the kernel killed the salvage operati=
on the first two times due to out of memory conditions.. something I had no=
t checked, or expected. So it may be that this is the second &quot;true&quo=
t; salvage, and it may succeed.<br>
<br>I&#39;ll keep you all posted. There wasn&#39;t an error in the AFS logs=
 that indicated that salvager proceses had been killed due to OOM. It was o=
nly in the kernel logs.<br><br></div></div>-- <br><span style=3D"border-col=
lapse:collapse;color:rgb(102,102,102);font-family:verdana,sans-serif;font-s=
ize:x-small">Timothy Balcer </span><br>


--14dae9cfcca499bc4904cd523538--