[OpenAFS] Re: ProbeUuid for host failed

Ken Elkabany Ken@Elkabany.com
Tue, 3 Apr 2012 19:04:03 -0700


--0016e6de179c9b9e3b04bcd0d706
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Apr 3, 2012 at 10:25 AM, Andrew Deason <adeason@sinenomine.net>wrote:

> On Mon, 2 Apr 2012 19:04:19 -0700
> Ken Elkabany <Ken@Elkabany.com> wrote:
>
> > Over time these errors become more and more frequent. The problem is
> > that the client who hits this issue will experience a 5-10s delay in
> > accessing a file, which hurts performance significantly. The clients
> > are 1.6pre1, and the server is 1.4.14
>
> 1.6.0pre1? Or 1.6.1pre1?
>

1.6.0pre1 which was packaged with Ubuntu 11.10. Should we make it a
priority to upgrade?

>
> > Using afsmonitor, I do see that one of the clients hitting this issue
> > (I haven't checked whether all client have the problem, but many seem
> > to) has 17M callbacks alloced. Could that be suspect?
>
> Yes; that should not be possible unless the client is within a certain
> narrow range of versions. The client could be tied up trying to clear up
> that queue of GUCB messages, which is why everything would appear to
> freeze for a short time, and you get that ProbeUuid failure.
>
> What are GUCB messages? Why would they pile up, and in which circumstances?

>  --
> Andrew Deason
> adeason@sinenomine.net
>
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>

I traced the ProbeUuid failure to the OpenAFS fileservers using the
incorrect IP for certain clients. The clients each have one interface, but
are accessible via 2 IP addresses (one external/internet/WAN, one
internal/local). The fileservers would use their external IP address, which
the firewall would block. After opening up the external IP address ports,
the probeuuid errors disappeared. Anyone seen this problem before? The
servers are sitting in Amazon EC2, so there's additional complexity with
how the fileserver resolves the client IP address.

--0016e6de179c9b9e3b04bcd0d706
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div class=3D"gmail_quote">On Tue, Apr 3, 2012 at 10:25 AM, Andrew Deason <=
span dir=3D"ltr">&lt;<a href=3D"mailto:adeason@sinenomine.net">adeason@sine=
nomine.net</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class=3D"im">On Mon, 2 Apr 2012 19:04:19 -0700<br>
Ken Elkabany &lt;Ken@Elkabany.com&gt; wrote:<br>
<br>
&gt; Over time these errors become more and more frequent. The problem is<b=
r>
&gt; that the client who hits this issue will experience a 5-10s delay in<b=
r>
&gt; accessing a file, which hurts performance significantly. The clients<b=
r>
&gt; are 1.6pre1, and the server is 1.4.14<br>
<br>
</div>1.6.0pre1? Or 1.6.1pre1?<br></blockquote><div><br></div><div>1.6.0pre=
1 which was packaged with Ubuntu 11.10.=A0Should we make it a priority to u=
pgrade?</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;b=
order-left:1px #ccc solid;padding-left:1ex">


<div class=3D"im"><br>
&gt; Using afsmonitor, I do see that one of the clients hitting this issue<=
br>
&gt; (I haven&#39;t checked whether all client have the problem, but many s=
eem<br>
&gt; to) has 17M callbacks alloced. Could that be suspect?<br>
<br>
</div>Yes; that should not be possible unless the client is within a certai=
n<br>
narrow range of versions. The client could be tied up trying to clear up<br=
>
that queue of GUCB messages, which is why everything would appear to<br>
freeze for a short time, and you get that ProbeUuid failure.<br>
<span class=3D"HOEnZb"><font color=3D"#888888"><br></font></span></blockquo=
te><div>What are GUCB messages? Why would they pile up, and in which circum=
stances?</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;=
border-left:1px #ccc solid;padding-left:1ex">

<span class=3D"HOEnZb"><font color=3D"#888888">
--<br>
Andrew Deason<br>
<a href=3D"mailto:adeason@sinenomine.net">adeason@sinenomine.net</a><br>
<br>
_______________________________________________<br>
OpenAFS-info mailing list<br>
<a href=3D"mailto:OpenAFS-info@openafs.org">OpenAFS-info@openafs.org</a><br=
>
<a href=3D"https://lists.openafs.org/mailman/listinfo/openafs-info" target=
=3D"_blank">https://lists.openafs.org/mailman/listinfo/openafs-info</a><br>
</font></span></blockquote></div><br><div>I traced the ProbeUuid failure to=
 the OpenAFS fileservers using the incorrect IP for certain clients. The cl=
ients each have one interface, but are accessible via 2 IP addresses (one e=
xternal/internet/WAN, one internal/local). The fileservers would use their =
external IP address, which the firewall would block. After opening up the e=
xternal IP address ports, the probeuuid errors disappeared. Anyone seen thi=
s problem before? The servers are sitting in Amazon EC2, so there&#39;s add=
itional complexity with how the fileserver resolves the client IP address.<=
/div>

<div><br></div>

--0016e6de179c9b9e3b04bcd0d706--