[OpenAFS] OpenAFS access at login time on MacOS
Richard Feltstykket
richard@unixboxen.net
Sat, 13 May 2023 17:32:45 +0000
On Sat, May 13, 2023 at 12:21:45PM -0400, Jeffrey E Altman wrote:
>On 5/13/2023 11:44 AM, Jeffrey E Altman (jaltman@auristor.com) wrote:
>>On 5/11/2023 6:20 AM, Richard Feltstykket (richard@unixboxen.net) wrote:
>>>Hello Everyone,
>>>
>>>Perhaps it is widely known already, but I just wanted to share a
>>>process that I have worked out to get a kerberos ticket and an afs
>>>token at login time on MacOS. It seems to work fine for MacOS
>>>Ventura and Monterey; I have not tested on other versions.
>>Thanks for posting.
>>>
>>>My cell takes FOREVER to log in for some reason, but after aklog
>>>completes in the background, I have a token and can access volumes
>>>in the cell.
Awesome! Thank you for all of these great troubleshooting tips! This is a new cell with both Debian based systems running the OpenAFS server/client, as well as MacOS systems running the Auristor client. All systems (except for the sole administrative server on the zone) have the slow token acquisition issue, so the error is definitely in that list.
>>
>>Negative DNS lookups impose an unnecessary time delay.
>>
>>Assuming the name of your domain example.net is also the name of
>>your cell and Kerberos realm (in upper case), and assuming the
>>following hostnames for your kdc and afsdb servers
>>
>> kdc1.example.net
>>
>> afsdb1.example.net
>>
>>create the following DNS entries
>>
>> _kerberos.example.net. IN TXT "EXAMPLE.NET"
>>
>> _kerberos._afs.example.net. IN TXT "EXAMPLE.NET"
>>
>> _kerberos._tcp.example.net. IN SRV 10 0 88 kdc1.example.net.
>>
>> _kerberos._udp.example.net. IN SRV 10 0 88 kdc1.example.net.
>>
>> _kerberos._http.example.net. IN SRV 0 0 0 .
>>
>> _kerberos._kkdcp.example.net. IN SRV 0 0 0 .
>>
>> _afs3-vlserver._udp.example.net. IN SRV 10 0 7003 afsdb1.example.net.
>>
>> _afs3-prserver._udp.example.net. IN SRV 10 0 7002 afsdb1.example.net.
>>
>>If you are using the AFS backup service:
>>
>> _afs3-budbserver._udp.example.net. IN SRV 10 0 7021
>> afsdb1.example.net.
>>
>>If you are not using the AFS backup service:
>>
>> _afs3-budbserver._udp.example.net. IN SRV 0 0 0 .
>>
>>If there are more than one KDC or AFSDB server, then create one
>>_kerberos* SRV record for each KDC and one _afs3-* entry for each
>>AFSDB server.
>>
>>Note that the hostname specified in a SRV record must not be a
>>CNAME; it must be A or AAAA records. For the _afs3-* SRV records
>>for an OpenAFS cell which does not support IPv6 the specified
>>hostname should not have a AAAA record. The AuriStorFS cache
>>managers and Linux kernel afs (kafs) clients will attempt to contact
>>the location servers via IPv6 if there is a AAAA record specified.
>>
>>A SRV record whose hostname is "." indicates that the service is
>>unavailable.
>>
>>The AuriStorFS aklog will attempt to acquire both yfs-rxgk tokens
>>and rxkad_k5 tokens. An OpenAFS cell does not support yfs-rxgk but
>>aklog doesn't know that until it is explicitly told by the Kerberos
>>realm that there is no yfs-rxgk/_afs.unixboxen.net@UNIXBOXEN.NET
>>service principal. This requires that GSS-KRB5 be able to quickly
>>resolve the Kerberos realm for the name "_afs.unixboxen.net". The
>>SRV record specified above for _kerberos._afs.unixboxen.net is
>>intended to speed up the resolution of the hostname to realm mapping
>>if the client is configured to do so.
>>
>One thing I forgot to mention.
>
>The service principal for yfs-rxgk is
>yfs-rxgk/_afs.example.net@EXAMPLE.NET instead of
>afs/example.net@EXAMPLE.NET as is used for rxkad_k5. The reason that
>_afs.example.net is used is because of how GSS-API Kerberos v5
>implementations resolve the Kerberos realm of a service where the
>second component is a hostname. GSS-API will fallback to using the
>DNS domain of the hostname as the realm if there is no other
>information available. However, many implementations including macOS
>and MIT will try to validate the second component as a valid DNS
>hostname as part of the lookup process. Therefore it issues a DNS A
>and AAAA query for "_afs.example.net" even though a DNS hostname is
>not permitted to begin with an underscore. In hindsight specifying
>the service principal in
>https://datatracker.ietf.org/doc/html/draft-wilkinson-afs3-rxgk-afs
>with an underscore based hostname was a poor idea. That said, DNS
>resolvers and most Kerberos libraries do not perform validation on the
>query string and most DNS servers will happily respond to the out of
>specification request if there is an entry present. I therefore
>suggest creating DNS A and AAAA records for _afs.example.net to avoid
>the negative lookup. The address doesn't matter since the DNS
>response will not be used to contact any host. Specifying one of the
>location servers is reasonable.
>
>>For rxkad_k5 tokens the resolution of which Kerberos realm to use is
>>performed by enumerating the hostnames of the location servers,
>>performing an A/AAAA DNS query to obtain the IP addresses, then
>>performing a PTR record lookup on the IP addresses. For example
>>
>> afsdb1.example.net A -> 192.0.2.23
>>
>> 129.0.2.23 PTR -> host.example.net
>>
>> _kerberos.host.example.net TXT -> "EXAMPLE.NET"
>>
>> _kerberos.example.net TXT -> "EXAMPLE.NET" (queried if the
>> _kerberos.host.example.net entry is not present)
>>
>>If there are more than one location service address, then the one
>>that is used for resolution of the Kerberos realm can appear to be
>>random because whichever is first in the list will be used.
>>
>>Issuing a "kinit user@EXAMPLE.NET" against your realm took a little
>>more than six seconds to perform the DNS lookups for the kdc on a
>>macOS Ventura 13.4 system. It then took approximately 180ms to
>>receive the expected principal unknown response to the AS-REQ. I
>>cannot measure the time to perform the aklog operations because I
>>cannot obtain a TGT to test with.
>>
>>The time for the AuriStorFS v2021.05-28 cache manager on macOS 13.4
>>to "ls -l /afs/example.net" anonymously was
>>
>> * 470ms to resolve the location service via DNS (3 RPCs)
>> * 330ms to resolve the location of the "root.cell" volume (2 RPCs +
>> reachability test)
>> * 850ms for the fileserver response to the first RPC including the
>> fileserver->client callback service TellMeAboutYourself queries (3
>> RPCs + reachability tests)
>> * 600ms to read the contents of the root directory and obtain status
>> info for each entry (3 RPCs)
>>
>>The ICMP ping rtt from my test system to the location server
>>averages 115ms.
>>
>>If the vlserver and fileserver connections were authenticated using
>>rxkad or yfs-rxgk the PING|PING_RESPONSE reachability test for each
>>RX connection would be replaced by a CHALLENGE|RESPONSE exchange.
>>If the cache manage to fileserver connection was authenticated using
>>yfs-rxgk, then the fileserver TellMeAboutYourself query to the cache
>>manager would not be performed.
>>
>>I suspect you can reduce some of the time by adding the DNS records
>>that are not present in your domain. You can observe the DNS,
>>Kerberos and AFS queries using wireshark
>>https://www.wireshark.org/download.html. Start a capture and set a
>>filter rule of "dns or rx or kerberos or icmp or icmpv6".
>>
>>Feel free to reply privately if you wish to discuss details of your
>>actual network configuration.
>>
>>Jeffrey Altman
>>
Thanks,
Richard