[OpenAFS] Re: ProbeUuid for host failed

Ken Elkabany Ken@Elkabany.com
Tue, 3 Apr 2012 21:55:36 -0700


--0016e6de00a12a7e6904bcd33dd3
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Apr 3, 2012 at 8:42 PM, Andrew Deason <adeason@sinenomine.net>wrote:

> On Tue, 3 Apr 2012 19:04:03 -0700
> Ken Elkabany <Ken@Elkabany.com> wrote:
>
> > 1.6.0pre1 which was packaged with Ubuntu 11.10. Should we make it a
> > priority to upgrade?
>
> Yes, there are many known problems with that.
>

We'll upgrade asap, and revisit these issues then.

>
> > > Yes; that should not be possible unless the client is within a
> > > certain narrow range of versions. The client could be tied up trying
> > > to clear up that queue of GUCB messages, which is why everything
> > > would appear to freeze for a short time, and you get that ProbeUuid
> > > failure.
> >
> > What are GUCB messages? Why would they pile up, and in which
> > circumstances?
>
> I wasn't really explaining because the technical details aren't really
> important as far as avoiding the issue goes. The solution is: "upgrade"
> (or "downgrade"; generally "don't use that version"). I don't have any
> workarounds nor a lot of knowledge about what triggers it; the code
> making this situation possible is not in any actual release and so I
> didn't expect to see anyone run into issues with it.
>
> But since you _asked_....
>
> I was referring to a GiveUpCallBacks message, which the client sends to
> the fileserver for a file if it's not caching it anymore, to let the
> fileserver know that the client does not need to be contacted if the
> file changes. Normally these are queued and a bunch are sent at once; if
> we queue too many we send what we have at the time of the queueing. In
> older clients this was done while holding certain heavy-duty locks, and
> in certain situations can incur a kind of deadlock between the
> fileserver and clients. One way around that which temporarily existed in
> the code was to allocate new structures dynamically when we ran out of
> the pre-set amount, and to flush the queue periodically like we always
> did. Since the amount of such structures dynamically allocated is
> unbounded, there were concerns that this approach could cause
> situations... well, situations like you're describing. So before 1.6.0
> this was changed to allow certain locks to be dropped when we flushed
> the queue when it became full, similar to the original behavior.
>
> Thanks for the explanation!


> > I traced the ProbeUuid failure to the OpenAFS fileservers using the
> > incorrect IP for certain clients. The clients each have one interface,
> > but are accessible via 2 IP addresses (one external/internet/WAN, one
> > internal/local). The fileservers would use their external IP address,
> > which the firewall would block.
>
> I'm not sure I understand; which IP are the fileservers supposed to use?
> The internal one or the external one? You can configure a client to only
> advertise certain addresses with NetInfo and NetRestrict:
>
> <http://docs.openafs.org/Reference/5/NetInfo.html>
> <http://docs.openafs.org/Reference/5/NetRestrict.html>
>
> The fileservers should use the internal IP. The clients only have 1
network interface, which is linked to the internal IP. Clearly the address
is being translated at some point, but we're not sure why the packets are
being routed outside the internal network. Most likely, it's specific to us
and Amazon EC2.

> --
> Andrew Deason
> adeason@sinenomine.net
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>

--0016e6de00a12a7e6904bcd33dd3
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div class=3D"gmail_quote">On Tue, Apr 3, 2012 at 8:42 PM, Andrew Deason <s=
pan dir=3D"ltr">&lt;<a href=3D"mailto:adeason@sinenomine.net">adeason@sinen=
omine.net</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class=3D"im">On Tue, 3 Apr 2012 19:04:03 -0700<br>
Ken Elkabany &lt;Ken@Elkabany.com&gt; wrote:<br>
<br>
&gt; 1.6.0pre1 which was packaged with Ubuntu 11.10. Should we make it a<br=
>
&gt; priority to upgrade?<br>
<br>
</div>Yes, there are many known problems with that.<br></blockquote><div><b=
r></div><div>We&#39;ll upgrade asap, and revisit these issues then.</div><b=
lockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px =
#ccc solid;padding-left:1ex">


<div class=3D"im"><br>
&gt; &gt; Yes; that should not be possible unless the client is within a<br=
>
&gt; &gt; certain narrow range of versions. The client could be tied up try=
ing<br>
&gt; &gt; to clear up that queue of GUCB messages, which is why everything<=
br>
&gt; &gt; would appear to freeze for a short time, and you get that ProbeUu=
id<br>
&gt; &gt; failure.<br>
&gt;<br>
&gt; What are GUCB messages? Why would they pile up, and in which<br>
&gt; circumstances?<br>
<br>
</div>I wasn&#39;t really explaining because the technical details aren&#39=
;t really<br>
important as far as avoiding the issue goes. The solution is: &quot;upgrade=
&quot;<br>
(or &quot;downgrade&quot;; generally &quot;don&#39;t use that version&quot;=
). I don&#39;t have any<br>
workarounds nor a lot of knowledge about what triggers it; the code<br>
making this situation possible is not in any actual release and so I<br>
didn&#39;t expect to see anyone run into issues with it.<br>
<br>
But since you _asked_....<br>
<br>
I was referring to a GiveUpCallBacks message, which the client sends to<br>
the fileserver for a file if it&#39;s not caching it anymore, to let the<br=
>
fileserver know that the client does not need to be contacted if the<br>
file changes. Normally these are queued and a bunch are sent at once; if<br=
>
we queue too many we send what we have at the time of the queueing. In<br>
older clients this was done while holding certain heavy-duty locks, and<br>
in certain situations can incur a kind of deadlock between the<br>
fileserver and clients. One way around that which temporarily existed in<br=
>
the code was to allocate new structures dynamically when we ran out of<br>
the pre-set amount, and to flush the queue periodically like we always<br>
did. Since the amount of such structures dynamically allocated is<br>
unbounded, there were concerns that this approach could cause<br>
situations... well, situations like you&#39;re describing. So before 1.6.0<=
br>
this was changed to allow certain locks to be dropped when we flushed<br>
the queue when it became full, similar to the original behavior.<br>
<div class=3D"im"><br></div></blockquote><div>Thanks for the explanation!</=
div><div>=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .=
8ex;border-left:1px #ccc solid;padding-left:1ex"><div class=3D"im">
&gt; I traced the ProbeUuid failure to the OpenAFS fileservers using the<br=
>
&gt; incorrect IP for certain clients. The clients each have one interface,=
<br>
&gt; but are accessible via 2 IP addresses (one external/internet/WAN, one<=
br>
&gt; internal/local). The fileservers would use their external IP address,<=
br>
&gt; which the firewall would block.<br>
<br>
</div>I&#39;m not sure I understand; which IP are the fileservers supposed =
to use?<br>
The internal one or the external one? You can configure a client to only<br=
>
advertise certain addresses with NetInfo and NetRestrict:<br>
<br>
&lt;<a href=3D"http://docs.openafs.org/Reference/5/NetInfo.html" target=3D"=
_blank">http://docs.openafs.org/Reference/5/NetInfo.html</a>&gt;<br>
&lt;<a href=3D"http://docs.openafs.org/Reference/5/NetRestrict.html" target=
=3D"_blank">http://docs.openafs.org/Reference/5/NetRestrict.html</a>&gt;<br=
>
<div class=3D"HOEnZb"><div class=3D"h5"><br></div></div></blockquote><div>T=
he fileservers should use the internal IP. The clients only have 1 network =
interface, which is linked to the internal IP. Clearly the address is being=
 translated at some point, but we&#39;re not sure why the packets are being=
 routed outside the internal network. Most likely, it&#39;s specific to us =
and Amazon EC2.</div>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div class=3D"HOEnZb"><div class=3D"h5">
--<br>
Andrew Deason<br>
<a href=3D"mailto:adeason@sinenomine.net">adeason@sinenomine.net</a><br>
_______________________________________________<br>
OpenAFS-info mailing list<br>
<a href=3D"mailto:OpenAFS-info@openafs.org">OpenAFS-info@openafs.org</a><br=
>
<a href=3D"https://lists.openafs.org/mailman/listinfo/openafs-info" target=
=3D"_blank">https://lists.openafs.org/mailman/listinfo/openafs-info</a><br>
</div></div></blockquote></div><br>

--0016e6de00a12a7e6904bcd33dd3--