[OpenAFS] aklog and AFS DB server timeouts

A. Lewenberg deb251@lewenberg.com
Fri, 29 Jan 2021 15:27:13 -0800

On 1/29/2021 11:38 AM, RL wrote:
> On the relevant clients, are all three with full name in /etc/hosts ? 
> Else failure is standard as
>    192.168.*.*
> is a private thingie that never gets resolved with DNS
> Regards, R.

I am not actually using those RFC1918 IP addresses, I just put those in 
as examples. In other words, the IP addresses are not the issue.

The network sniffing shows me that name resolution is not the issue: I 
am seeing traffic between my server and all three AFS DB servers.

> ------------------------------------------------------------------------------------------------------------ 
> On 1/29/21 7:32 PM, A. Lewenberg wrote:
>> On our buster servers the OpenAFS client (1.8.2) has an issue with 
>> provisioning an AFS token. When I attempt to get an AFS token it very 
>> often takes a long time.
>> $ aklog (this can up to 30 seconds or more)
>> After some investigation it looks like aklog is trying the AFS DB 
>> servers listed in /etc/openafs/CellSrvDB and timing out on some of the 
>> DB servers. Here is the relevant contents of that file:
>> >example.com           # My Company
>>                    #afsdb1.example.com
>>                    #afsdb2.example.com
>>                    #afsdb3.example.com
>> Running aklog and sniffing the network I see that the client attempts 
>> to contact one of the three afsdb servers. If the one it chooses to 
>> contact first is afsdb2 or afsdb3 the connection does not succeed 
>> until it finally gives up and tries anther one. If the second one it 
>> tries is afsdb2 or afsdb3 it gives up and tries the only remaining 
>> one: afsdb1. In other words:
>> afsdb3 (fail), afsdb2 (fail), afsdb1 (succeeds)
>> afsdb2 (fail), afsdb3 (fail), afsdb1 (succeeds)
>> afsdb3 (fail), afsdb1 (succeeds)
>> afsdb2 (fail), afsdb1 (succeeds)
>> afsdb1 (succeeds)
>> This sounds like both afsdb2 and afsdb3 are simply not working. 
>> However...
>> If I remove afsdb1 and afsdb2 from the CellSrvDB leaving only afsdb3 
>> it works instantly every time! That is, the following CellSrvDB works 
>> without delay:
>> >ir.example.com           # My Company
>>                    #afsdb3.example.com
>> Similarly, if afsdb2 is the only entry in CellSrvDB running aklog 
>> works without delay. So it cannot be that afsdb2 and afsdb3 are 
>> completely broken.
>> The AFS DB servers are running OpenAFS version 1.6.9.
>> What the heck is going on?
>> _______________________________________________
>> OpenAFS-info mailing list
>> OpenAFS-info@openafs.org
>> https://lists.openafs.org/mailman/listinfo/openafs-info
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info