[OpenAFS] afs: Lost contact with file server xxx.xxx.xxx.x

TIARA System Man sysman@tiara.sinica.edu.tw
Sat, 19 Apr 2008 17:08:58 +0800


------=_Part_14444_18273507.1208596138575
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

hi jeffrey,
you are right! it looks like that.

we have three branch offices. one of them is behind NAT. the other two sites
do not encounter this problem, so far. that two sites have real ip
addresses.

(snip)
This was fixed in the Windows cache manager by always retrying RPCs sent on
an existing RX connection that timed out once with a new RX connection.  I
am not sure that a similar change was ever made to the UNIX cache manager.
(snip)

i have added "CACHESIZE=2097152" in /etc/rc.d/init.d/afs file. restart the
afs clinet. is this UNIX cache manager? sorry, i'm new to afs.

the other thing, NAT server itself also has "Lost contact" problem.

please give me some hints. thank you.

best, sam

On Fri, Apr 18, 2008 at 11:14 PM, Jeffrey Altman <
jaltman@secure-endpoints.com> wrote:

> TIARA System Man wrote:
>
> > hi guys,
> >
> > can anyone tell what is wrong with "Lost contact with file server "
> > issue? that afs clinet is not in the same domain of afs server. the
> > connection speed is up to 16MB/sec.
> >
>
> Sounds like NAT bouncing to me.  The NAT device keeps timing out the
> port mappings and therefore the RX connections in use with the old mapping
> become invalid but neither side of the RX connection is able to notice.
>
> Client sends to file server.  File server sees message from an existing IP
> address/port value arrive from a new IP address/port value and therefore
> responds to the original IP address/port value in order to prevent hijacking
> attacks.
>
> The NAT blocks the reply sent to the old port value.
>
> The client thinks the file server is not responding and marks the file
> server as down.
>
> The client later probes the down servers with a new RX connection and that
> succeeds, so the server is marked up.


> This was fixed in the Windows cache manager by always retrying RPCs sent
> on an existing RX connection that timed out once with a new RX connection.
>  I am not sure that a similar change was ever made to the UNIX cache
> manager.
>
> Jeffrey Altman
>
>
>


-- 
Sam Tseng
Academia Sinica
Institute of Astronomy and Astrophysics
Tel.: +886-2-33652200 ext 742
Fax: +886-2-23677849

------=_Part_14444_18273507.1208596138575
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

hi jeffrey,<div><br></div><div>you are right! it looks like that.</div><div><br></div><div>we have three&nbsp;branch offices. one of them is behind NAT. the other two sites do not encounter this problem, so far. that two sites have real ip addresses.&nbsp;</div>
<div><br></div><div>(snip)</div><div><span class="Apple-style-span" style="border-collapse: collapse; ">This was fixed in the Windows cache manager by always retrying RPCs sent on an existing RX connection that timed out once with a new RX connection. &nbsp;I am not sure that a similar change was ever made to the UNIX cache manager.</span><br>
</div><div>(snip)</div><div><br></div><div>i have added &quot;CACHESIZE=2097152&quot; in&nbsp;/etc/rc.d/init.d/afs file. restart the afs clinet. is this&nbsp;<span class="Apple-style-span" style="border-collapse: collapse; ">UNIX cache manager? sorry, i&#39;m new to afs.</span></div>
<div><span class="Apple-style-span" style="border-collapse: collapse;"><br></span></div><div><span class="Apple-style-span" style="border-collapse: collapse;">the other thing, NAT server itself also has &quot;<span class="Apple-style-span" style="white-space: pre; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; ">Lost contact&quot; problem.</span></span></div>
<div><span class="Apple-style-span" style="border-collapse: collapse; white-space: pre; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"><br></span></div><div><span class="Apple-style-span" style="border-collapse: collapse; white-space: pre; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;">please give me some hints. thank you.</span></div>
<div><span class="Apple-style-span" style="border-collapse: collapse; white-space: pre; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"><br></span></div><div><span class="Apple-style-span" style="border-collapse: collapse; white-space: pre; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;">best, sam</span></div>
<div><br><div class="gmail_quote">On Fri, Apr 18, 2008 at 11:14 PM, Jeffrey Altman &lt;<a href="mailto:jaltman@secure-endpoints.com">jaltman@secure-endpoints.com</a>&gt; wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="Ih2E3d">TIARA System Man wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8x;border-left:1px #ccc solid;padding-left:1ex">
hi guys,<br>
<br>
can anyone tell what is wrong with &quot;Lost contact with file server &quot; issue? that afs clinet is not in the same domain of afs server. the connection speed is up to 16MB/sec.<br>
</blockquote>
<br></div>
Sounds like NAT bouncing to me. &nbsp;The NAT device keeps timing out the<br>
port mappings and therefore the RX connections in use with the old mapping become invalid but neither side of the RX connection is able to notice.<br>
<br>
Client sends to file server. &nbsp;File server sees message from an existing IP address/port value arrive from a new IP address/port value and therefore responds to the original IP address/port value in order to prevent hijacking attacks.<br>

<br>
The NAT blocks the reply sent to the old port value.<br>
<br>
The client thinks the file server is not responding and marks the file server as down.<br>
<br>
The client later probes the down servers with a new RX connection and that succeeds, so the server is marked up.</blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>
This was fixed in the Windows cache manager by always retrying RPCs sent on an existing RX connection that timed out once with a new RX connection. &nbsp;I am not sure that a similar change was ever made to the UNIX cache manager.<br>
<font color="#888888">
<br>
Jeffrey Altman<br>
<br>
<br>
</font></blockquote></div><br><br clear="all"><br>-- <br>Sam Tseng<br>Academia Sinica<br>Institute of Astronomy and Astrophysics<br>Tel.: +886-2-33652200 ext 742<br>Fax: +886-2-23677849
</div>

------=_Part_14444_18273507.1208596138575--