[OpenAFS] aklog and AFS DB server timeouts

RL rainer.laatsch@t-online.de
Fri, 29 Jan 2021 21:03:21 +0100


This is a multi-part message in MIME format.
--------------B803B3AE9FF49707A0FB2548
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit

On windows, entering the server entries in the file/
/

/   c:\Windows\System32\Drivers\etc\hosts/

/might help?  Maybe give it a try on the clients?/

/
/

/Regards, R.
/



On 1/29/21 8:44 PM, Jeffrey E Altman wrote:
> Rainer,
>
> OpenAFS UNIX/Linux clients and server only use the IP addresses in the//
> CellServDB file.  The fully qualified domain names are only used by
> OpenAFS Windows clients.
>
> Jeffrey Altman
>
> On 1/29/2021 2:38 PM, RL (rainer.laatsch@t-online.de) wrote:
>> On the relevant clients, are all three with full name in /etc/hosts ?
>> Else failure is standard as
>>
>>    192.168.*.*
>> is a private thingie that never gets resolved with DNS
>> Regards, R.
>>
>> ------------------------------------------------------------------------------------------------------------
>>
>>
>> On 1/29/21 7:32 PM, A. Lewenberg wrote:
>>> On our buster servers the OpenAFS client (1.8.2) has an issue with
>>> provisioning an AFS token. When I attempt to get an AFS token it very
>>> often takes a long time.
>>>
>>> $ aklog (this can up to 30 seconds or more)
>>>
>>> After some investigation it looks like aklog is trying the AFS DB
>>> servers listed in /etc/openafs/CellSrvDB and timing out on some of the
>>> DB servers. Here is the relevant contents of that file:
>>>
>>>> example.com           # My Company
>>> 192.168.1.102                    #afsdb1.example.com
>>> 192.168.1.104                    #afsdb2.example.com
>>> 192.168.1.106                    #afsdb3.example.com
>>>
>>> Running aklog and sniffing the network I see that the client attempts
>>> to contact one of the three afsdb servers. If the one it chooses to
>>> contact first is afsdb2 or afsdb3 the connection does not succeed
>>> until it finally gives up and tries anther one. If the second one it
>>> tries is afsdb2 or afsdb3 it gives up and tries the only remaining
>>> one: afsdb1. In other words:
>>>
>>> afsdb3 (fail), afsdb2 (fail), afsdb1 (succeeds)
>>> afsdb2 (fail), afsdb3 (fail), afsdb1 (succeeds)
>>> afsdb3 (fail), afsdb1 (succeeds)
>>> afsdb2 (fail), afsdb1 (succeeds)
>>> afsdb1 (succeeds)
>>>
>>> This sounds like both afsdb2 and afsdb3 are simply not working.
>>> However...
>>>
>>> If I remove afsdb1 and afsdb2 from the CellSrvDB leaving only afsdb3
>>> it works instantly every time! That is, the following CellSrvDB works
>>> without delay:
>>>
>>>> ir.example.com           # My Company
>>> 192.168.1.106                    #afsdb3.example.com
>>>
>>> Similarly, if afsdb2 is the only entry in CellSrvDB running aklog
>>> works without delay. So it cannot be that afsdb2 and afsdb3 are
>>> completely broken.
>>>
>>> The AFS DB servers are running OpenAFS version 1.6.9.
>>>
>>> What the heck is going on?
>>> _______________________________________________
>>> OpenAFS-info mailing list
>>> OpenAFS-info@openafs.org
>>> https://lists.openafs.org/mailman/listinfo/openafs-info
>> _______________________________________________
>> OpenAFS-info mailing list
>> OpenAFS-info@openafs.org
>> https://lists.openafs.org/mailman/listinfo/openafs-info

--------------B803B3AE9FF49707A0FB2548
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>On windows, entering the server entries in the file<em><br>
      </em></p>
    <p><em>   c:\Windows\System32\Drivers\etc\hosts</em></p>
    <p><em>might help?  Maybe give it a try on the clients?</em></p>
    <p><em><br>
      </em></p>
    <p><em>Regards, R.<br>
      </em></p>
    <p>  <br>
    </p>
    <p>  <br>
    </p>
    <div class="moz-cite-prefix">On 1/29/21 8:44 PM, Jeffrey E Altman
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:58782f4a-0249-6c6c-9d90-44c3d1072902@auristor.com">
      <pre class="moz-quote-pre" wrap="">Rainer,

OpenAFS UNIX/Linux clients and server only use the IP addresses in the<em></em>
CellServDB file.  The fully qualified domain names are only used by
OpenAFS Windows clients.

Jeffrey Altman

On 1/29/2021 2:38 PM, RL (<a class="moz-txt-link-abbreviated" href="mailto:rainer.laatsch@t-online.de">rainer.laatsch@t-online.de</a>) wrote:
</pre>
      <blockquote type="cite">
        <pre class="moz-quote-pre" wrap="">On the relevant clients, are all three with full name in /etc/hosts ?
Else failure is standard as

  192.168.*.*
is a private thingie that never gets resolved with DNS
Regards, R.

------------------------------------------------------------------------------------------------------------


On 1/29/21 7:32 PM, A. Lewenberg wrote:
</pre>
        <blockquote type="cite">
          <pre class="moz-quote-pre" wrap="">On our buster servers the OpenAFS client (1.8.2) has an issue with
provisioning an AFS token. When I attempt to get an AFS token it very
often takes a long time.

$ aklog (this can up to 30 seconds or more)

After some investigation it looks like aklog is trying the AFS DB
servers listed in /etc/openafs/CellSrvDB and timing out on some of the
DB servers. Here is the relevant contents of that file:

</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">example.com           # My Company
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">192.168.1.102                    #afsdb1.example.com
192.168.1.104                    #afsdb2.example.com
192.168.1.106                    #afsdb3.example.com

Running aklog and sniffing the network I see that the client attempts
to contact one of the three afsdb servers. If the one it chooses to
contact first is afsdb2 or afsdb3 the connection does not succeed
until it finally gives up and tries anther one. If the second one it
tries is afsdb2 or afsdb3 it gives up and tries the only remaining
one: afsdb1. In other words:

afsdb3 (fail), afsdb2 (fail), afsdb1 (succeeds)
afsdb2 (fail), afsdb3 (fail), afsdb1 (succeeds)
afsdb3 (fail), afsdb1 (succeeds)
afsdb2 (fail), afsdb1 (succeeds)
afsdb1 (succeeds)

This sounds like both afsdb2 and afsdb3 are simply not working.
However...

If I remove afsdb1 and afsdb2 from the CellSrvDB leaving only afsdb3
it works instantly every time! That is, the following CellSrvDB works
without delay:

</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">ir.example.com           # My Company
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">192.168.1.106                    #afsdb3.example.com

Similarly, if afsdb2 is the only entry in CellSrvDB running aklog
works without delay. So it cannot be that afsdb2 and afsdb3 are
completely broken.

The AFS DB servers are running OpenAFS version 1.6.9.

What the heck is going on?
_______________________________________________
OpenAFS-info mailing list
<a class="moz-txt-link-abbreviated" href="mailto:OpenAFS-info@openafs.org">OpenAFS-info@openafs.org</a>
<a class="moz-txt-link-freetext" href="https://lists.openafs.org/mailman/listinfo/openafs-info">https://lists.openafs.org/mailman/listinfo/openafs-info</a>
</pre>
        </blockquote>
        <pre class="moz-quote-pre" wrap="">_______________________________________________
OpenAFS-info mailing list
<a class="moz-txt-link-abbreviated" href="mailto:OpenAFS-info@openafs.org">OpenAFS-info@openafs.org</a>
<a class="moz-txt-link-freetext" href="https://lists.openafs.org/mailman/listinfo/openafs-info">https://lists.openafs.org/mailman/listinfo/openafs-info</a>
</pre>
      </blockquote>
    </blockquote>
  </body>
</html>

--------------B803B3AE9FF49707A0FB2548--