[OpenAFS] AFS server hangs after weekly bos restart
Derrick Brashear
shadow@gmail.com
Wed, 10 Dec 2008 08:37:08 -0500
------=_Part_59585_6554150.1228916228498
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
When the server stops responding, what processes are running and what are
they doing at the time?
On Wed, Dec 10, 2008 at 6:16 AM, Eric Chris Garrison <ecgarris@iupui.edu>wrote:
> Hello,
>
> A couple of months ago, I upgraded our OpenAFS servers to 1.4.7. Three
> weeks ago, a problem where the main metadata server (1st of 3) would stop
> responding to AFS requests properly and within a couple of hours, all
> clients become unable to get files, vos commands stop responding, etc. If
> the machine is rebooted, the problem goes away until the next restart. Just
> restarting openafs-server does not fix the problem, however.
>
> Oddly, when I did a manual "bos restart <server> -all" it didn't reproduce
> the problem. I was thinking that this meant the problem wasn't the bos
> restart at all... but when I changed the day on which the bos restart
> happened, the problem changed days with it.
>
> Sorry for the vagueness, but no one has been online to observe this
> starting, we're just doing forensics on the aftermath.
>
> I'd appreciate any suggestions on why this might be happening and things to
> check.
>
> Thank you,
>
> Chris
> --
> Eric Chris Garrison | Principal Mass Storage Specialist
> ecgarris@iupui.edu <mailto:ecgarris@iupui.edu> | Indiana
> University - Research Storage <mailto:ecgarris@iupui.edu>
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>
--
Derrick
------=_Part_59585_6554150.1228916228498
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
When the server stops responding, what processes are running and what are they doing at the time?<br><br><div class="gmail_quote">On Wed, Dec 10, 2008 at 6:16 AM, Eric Chris Garrison <span dir="ltr"><<a href="mailto:ecgarris@iupui.edu">ecgarris@iupui.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hello,<br>
<br>
A couple of months ago, I upgraded our OpenAFS servers to <a href="http://1.4.7." target="_blank">1.4.7.</a> Three weeks ago, a problem where the main metadata server (1st of 3) would stop responding to AFS requests properly and within a couple of hours, all clients become unable to get files, vos commands stop responding, etc. If the machine is rebooted, the problem goes away until the next restart. Just restarting openafs-server does not fix the problem, however.<br>
<br>
Oddly, when I did a manual "bos restart <server> -all" it didn't reproduce the problem. I was thinking that this meant the problem wasn't the bos restart at all... but when I changed the day on which the bos restart happened, the problem changed days with it.<br>
<br>
Sorry for the vagueness, but no one has been online to observe this starting, we're just doing forensics on the aftermath.<br>
<br>
I'd appreciate any suggestions on why this might be happening and things to check.<br>
<br>
Thank you,<br>
<br>
Chris<br><font color="#888888">
--<br>
Eric Chris Garrison | Principal Mass Storage Specialist<br>
<a href="mailto:ecgarris@iupui.edu" target="_blank">ecgarris@iupui.edu</a> <mailto:<a href="mailto:ecgarris@iupui.edu" target="_blank">ecgarris@iupui.edu</a>> | Indiana University - Research Storage <mailto:<a href="mailto:ecgarris@iupui.edu" target="_blank">ecgarris@iupui.edu</a>><br>
_______________________________________________<br>
OpenAFS-info mailing list<br>
<a href="mailto:OpenAFS-info@openafs.org" target="_blank">OpenAFS-info@openafs.org</a><br>
<a href="https://lists.openafs.org/mailman/listinfo/openafs-info" target="_blank">https://lists.openafs.org/mailman/listinfo/openafs-info</a><br>
</font></blockquote></div><br><br clear="all"><br>-- <br>Derrick<br>
------=_Part_59585_6554150.1228916228498--