[OpenAFS] aklog and AFS DB server timeouts
A. Lewenberg
deb251@lewenberg.com
Fri, 29 Jan 2021 15:27:13 -0800
On 1/29/2021 11:38 AM, RL wrote:
> On the relevant clients, are all three with full name in /etc/hosts ?
> Else failure is standard as
>
> 192.168.*.*
> is a private thingie that never gets resolved with DNS
> Regards, R.
I am not actually using those RFC1918 IP addresses, I just put those in
as examples. In other words, the IP addresses are not the issue.
The network sniffing shows me that name resolution is not the issue: I
am seeing traffic between my server and all three AFS DB servers.
>
> ------------------------------------------------------------------------------------------------------------
>
>
> On 1/29/21 7:32 PM, A. Lewenberg wrote:
>> On our buster servers the OpenAFS client (1.8.2) has an issue with
>> provisioning an AFS token. When I attempt to get an AFS token it very
>> often takes a long time.
>>
>> $ aklog (this can up to 30 seconds or more)
>>
>> After some investigation it looks like aklog is trying the AFS DB
>> servers listed in /etc/openafs/CellSrvDB and timing out on some of the
>> DB servers. Here is the relevant contents of that file:
>>
>> >example.com # My Company
>> 192.168.1.102 #afsdb1.example.com
>> 192.168.1.104 #afsdb2.example.com
>> 192.168.1.106 #afsdb3.example.com
>>
>> Running aklog and sniffing the network I see that the client attempts
>> to contact one of the three afsdb servers. If the one it chooses to
>> contact first is afsdb2 or afsdb3 the connection does not succeed
>> until it finally gives up and tries anther one. If the second one it
>> tries is afsdb2 or afsdb3 it gives up and tries the only remaining
>> one: afsdb1. In other words:
>>
>> afsdb3 (fail), afsdb2 (fail), afsdb1 (succeeds)
>> afsdb2 (fail), afsdb3 (fail), afsdb1 (succeeds)
>> afsdb3 (fail), afsdb1 (succeeds)
>> afsdb2 (fail), afsdb1 (succeeds)
>> afsdb1 (succeeds)
>>
>> This sounds like both afsdb2 and afsdb3 are simply not working.
>> However...
>>
>> If I remove afsdb1 and afsdb2 from the CellSrvDB leaving only afsdb3
>> it works instantly every time! That is, the following CellSrvDB works
>> without delay:
>>
>> >ir.example.com # My Company
>> 192.168.1.106 #afsdb3.example.com
>>
>> Similarly, if afsdb2 is the only entry in CellSrvDB running aklog
>> works without delay. So it cannot be that afsdb2 and afsdb3 are
>> completely broken.
>>
>> The AFS DB servers are running OpenAFS version 1.6.9.
>>
>> What the heck is going on?
>> _______________________________________________
>> OpenAFS-info mailing list
>> OpenAFS-info@openafs.org
>> https://lists.openafs.org/mailman/listinfo/openafs-info
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info