[OpenAFS] 'vos' command dos not finish, file service works ok (sort of)

Derrick Brashear shadow@gmail.com
Wed, 23 Jul 2008 16:21:32 -0400


------=_Part_135724_1875286.1216844492099
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On Wed, Jul 23, 2008 at 4:12 PM, Andreas Hirczy <ahi@itp.tugraz.at> wrote:

> Steve Devine <sd@msu.edu> writes:
>
> > Andreas Hirczy wrote:
> >>
> >> My AFS cell works ok in most scenarios, but since a reboot of one
> DB-server
> >> last friday no vos command besides "vos help" finishes - e.g.  "vos exa
> >> root.afs -localauth -verbose" hangs indefinitely and does not produce
> any
> >> output. Log files are also basically empty. File access works perfectly
> but I
> >> cannot create or move volumes; no backup of course.
> >
> > Sounds like firewall to me. can you run vos listvldb root.afs -localauth
> on
> > the db server?
>
> No firewall, but "vos listvldb root.afs -localauth" worked.


Talks to the vlserver, only


> And a miracle
> occured: after 10 hours of observed outage "vos exa ...." for volumes not
> on
> the blocking fileserver works again.
>

vos examine talks to the volservers. ok, well,



> Very strange: no entrys in the log files for 2 hours since last reboot and
> salvage. It did not work then. There are still 74 blocked connections on
> one
> fileserver, but that could be a different problem.  "man fileserver" seems
> to
> indicate, that this number will never go down again until restart.
> Unluckily
> "vos listvol" still runs slow - but triggers some logging messages at last:
>
> ==> /var/log/openafs/VolserLog <==
> Wed Jul 23 21:23:28 2008 FSYNC_clientInit temporary failure (will retry)
> Wed Jul 23 21:23:44 2008 FSYNC_clientInit temporary failure (will retry)
> Wed Jul 23 21:24:08 2008 FSYNC_clientInit temporary failure (will retry)
> Wed Jul 23 21:24:40 2008 FSYNC_clientInit temporary failure (will retry)
> Wed Jul 23 21:25:20 2008 FSYNC_clientInit temporary failure (will retry)
>
> ==> /var/log/openafs/BosLog <==
> Wed Jul 23 21:26:08 2008: fs:vol exited on signal 6
>
> ==> /var/log/openafs/VolserLog <==
> FSYNC_clientInit failed (giving up!): Connection refused
> Wed Jul 23 21:26:08 2008
> : Assertion failed! file ../vol/volume.c, line 705.


dead volserver would of course explain a hang. the volserver will restart
with an fs outage.

got a corefile?

------=_Part_135724_1875286.1216844492099
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

<div dir="ltr"><br><br><div class="gmail_quote">On Wed, Jul 23, 2008 at 4:12 PM, Andreas Hirczy &lt;<a href="mailto:ahi@itp.tugraz.at">ahi@itp.tugraz.at</a>&gt; wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Steve Devine &lt;<a href="mailto:sd@msu.edu">sd@msu.edu</a>&gt; writes:<br>
<br>
&gt; Andreas Hirczy wrote:<br>
&gt;&gt;<br>
&gt;&gt; My AFS cell works ok in most scenarios, but since a reboot of one DB-server<br>
&gt;&gt; last friday no vos command besides &quot;vos help&quot; finishes - e.g. &nbsp;&quot;vos exa<br>
&gt;&gt; root.afs -localauth -verbose&quot; hangs indefinitely and does not produce any<br>
&gt;&gt; output. Log files are also basically empty. File access works perfectly but I<br>
&gt;&gt; cannot create or move volumes; no backup of course.<br>
&gt;<br>
&gt; Sounds like firewall to me. can you run vos listvldb root.afs -localauth on<br>
&gt; the db server?<br>
<br>
No firewall, but &quot;vos listvldb root.afs -localauth&quot; worked.</blockquote><div><br>Talks to the vlserver, only<br>&nbsp;<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
 And a miracle<br>
occured: after 10 hours of observed outage &quot;vos exa ....&quot; for volumes not on<br>
the blocking fileserver works again.<br>
</blockquote><div><br>vos examine talks to the volservers. ok, well,<br><br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
Very strange: no entrys in the log files for 2 hours since last reboot and<br>
salvage. It did not work then. There are still 74 blocked connections on one<br>
fileserver, but that could be a different problem. &nbsp;&quot;man fileserver&quot; seems to<br>
indicate, that this number will never go down again until restart. Unluckily<br>
&quot;vos listvol&quot; still runs slow - but triggers some logging messages at last:<br>
<br>
==&gt; /var/log/openafs/VolserLog &lt;==<br>
Wed Jul 23 21:23:28 2008 FSYNC_clientInit temporary failure (will retry)<br>
Wed Jul 23 21:23:44 2008 FSYNC_clientInit temporary failure (will retry)<br>
Wed Jul 23 21:24:08 2008 FSYNC_clientInit temporary failure (will retry)<br>
Wed Jul 23 21:24:40 2008 FSYNC_clientInit temporary failure (will retry)<br>
Wed Jul 23 21:25:20 2008 FSYNC_clientInit temporary failure (will retry)<br>
<br>
==&gt; /var/log/openafs/BosLog &lt;==<br>
Wed Jul 23 21:26:08 2008: fs:vol exited on signal 6<br>
<br>
==&gt; /var/log/openafs/VolserLog &lt;==<br>
FSYNC_clientInit failed (giving up!): Connection refused<br>
Wed Jul 23 21:26:08 2008<br>
: Assertion failed! file ../vol/volume.c, line 705.</blockquote><div><br>dead volserver would of course explain a hang. the volserver will restart with an fs outage.<br><br>got a corefile?<br><br>&nbsp;<br></div></div><br></div>

------=_Part_135724_1875286.1216844492099--