[OpenAFS] 'vos' command dos not finish, file service works ok (sort of)
Derrick Brashear
shadow@gmail.com
Wed, 23 Jul 2008 16:21:32 -0400
------=_Part_135724_1875286.1216844492099
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
On Wed, Jul 23, 2008 at 4:12 PM, Andreas Hirczy <ahi@itp.tugraz.at> wrote:
> Steve Devine <sd@msu.edu> writes:
>
> > Andreas Hirczy wrote:
> >>
> >> My AFS cell works ok in most scenarios, but since a reboot of one
> DB-server
> >> last friday no vos command besides "vos help" finishes - e.g. "vos exa
> >> root.afs -localauth -verbose" hangs indefinitely and does not produce
> any
> >> output. Log files are also basically empty. File access works perfectly
> but I
> >> cannot create or move volumes; no backup of course.
> >
> > Sounds like firewall to me. can you run vos listvldb root.afs -localauth
> on
> > the db server?
>
> No firewall, but "vos listvldb root.afs -localauth" worked.
Talks to the vlserver, only
> And a miracle
> occured: after 10 hours of observed outage "vos exa ...." for volumes not
> on
> the blocking fileserver works again.
>
vos examine talks to the volservers. ok, well,
> Very strange: no entrys in the log files for 2 hours since last reboot and
> salvage. It did not work then. There are still 74 blocked connections on
> one
> fileserver, but that could be a different problem. "man fileserver" seems
> to
> indicate, that this number will never go down again until restart.
> Unluckily
> "vos listvol" still runs slow - but triggers some logging messages at last:
>
> ==> /var/log/openafs/VolserLog <==
> Wed Jul 23 21:23:28 2008 FSYNC_clientInit temporary failure (will retry)
> Wed Jul 23 21:23:44 2008 FSYNC_clientInit temporary failure (will retry)
> Wed Jul 23 21:24:08 2008 FSYNC_clientInit temporary failure (will retry)
> Wed Jul 23 21:24:40 2008 FSYNC_clientInit temporary failure (will retry)
> Wed Jul 23 21:25:20 2008 FSYNC_clientInit temporary failure (will retry)
>
> ==> /var/log/openafs/BosLog <==
> Wed Jul 23 21:26:08 2008: fs:vol exited on signal 6
>
> ==> /var/log/openafs/VolserLog <==
> FSYNC_clientInit failed (giving up!): Connection refused
> Wed Jul 23 21:26:08 2008
> : Assertion failed! file ../vol/volume.c, line 705.
dead volserver would of course explain a hang. the volserver will restart
with an fs outage.
got a corefile?
------=_Part_135724_1875286.1216844492099
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
<div dir="ltr"><br><br><div class="gmail_quote">On Wed, Jul 23, 2008 at 4:12 PM, Andreas Hirczy <<a href="mailto:ahi@itp.tugraz.at">ahi@itp.tugraz.at</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Steve Devine <<a href="mailto:sd@msu.edu">sd@msu.edu</a>> writes:<br>
<br>
> Andreas Hirczy wrote:<br>
>><br>
>> My AFS cell works ok in most scenarios, but since a reboot of one DB-server<br>
>> last friday no vos command besides "vos help" finishes - e.g. "vos exa<br>
>> root.afs -localauth -verbose" hangs indefinitely and does not produce any<br>
>> output. Log files are also basically empty. File access works perfectly but I<br>
>> cannot create or move volumes; no backup of course.<br>
><br>
> Sounds like firewall to me. can you run vos listvldb root.afs -localauth on<br>
> the db server?<br>
<br>
No firewall, but "vos listvldb root.afs -localauth" worked.</blockquote><div><br>Talks to the vlserver, only<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
And a miracle<br>
occured: after 10 hours of observed outage "vos exa ...." for volumes not on<br>
the blocking fileserver works again.<br>
</blockquote><div><br>vos examine talks to the volservers. ok, well,<br><br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
Very strange: no entrys in the log files for 2 hours since last reboot and<br>
salvage. It did not work then. There are still 74 blocked connections on one<br>
fileserver, but that could be a different problem. "man fileserver" seems to<br>
indicate, that this number will never go down again until restart. Unluckily<br>
"vos listvol" still runs slow - but triggers some logging messages at last:<br>
<br>
==> /var/log/openafs/VolserLog <==<br>
Wed Jul 23 21:23:28 2008 FSYNC_clientInit temporary failure (will retry)<br>
Wed Jul 23 21:23:44 2008 FSYNC_clientInit temporary failure (will retry)<br>
Wed Jul 23 21:24:08 2008 FSYNC_clientInit temporary failure (will retry)<br>
Wed Jul 23 21:24:40 2008 FSYNC_clientInit temporary failure (will retry)<br>
Wed Jul 23 21:25:20 2008 FSYNC_clientInit temporary failure (will retry)<br>
<br>
==> /var/log/openafs/BosLog <==<br>
Wed Jul 23 21:26:08 2008: fs:vol exited on signal 6<br>
<br>
==> /var/log/openafs/VolserLog <==<br>
FSYNC_clientInit failed (giving up!): Connection refused<br>
Wed Jul 23 21:26:08 2008<br>
: Assertion failed! file ../vol/volume.c, line 705.</blockquote><div><br>dead volserver would of course explain a hang. the volserver will restart with an fs outage.<br><br>got a corefile?<br><br> <br></div></div><br></div>
------=_Part_135724_1875286.1216844492099--