[OpenAFS-devel] [patch] darwin libresolv needs res_init() to adapt to network changes

Adam Megacz megacz@cs.berkeley.edu
Tue, 26 Dec 2006 01:22:39 -0800


Under many circumstances, attempts to aklog to a cell which uses
AFSDB (and is not in CellServDB) on Darwin will fail like this:

   $ aklog -c cell.edu
   aklog: unable to obtain tokens for cell cell.edu (status: 11862790).

I can make this happen reliably on my Intel-Mac (OS 10.4, OpenAFS
1.4.2) with the following procedure:

   1. Disconnect from the network
   2. Start OpenAFS
   3. Attempt to aklog to the cell in question (will fail)
   4. Connect to the network
   5. Attempt to aklog to the cell in question (should pass, does not)

Empirically, I know of at least five people who have encountered
either this exact problem or some problem producing the same error
message under very similar conditions.

Closer investigation reveals that the Darwin libresolv only checks
/etc/resolv.conf when res_init() is invoked (or the first time a
resolver function is used within a given process).

  http://developer.apple.com/documentation/Darwin/Reference/Manpages/man3/res_mkquery.3.html

This means that if afsd starts before the network settings are
finalized (for example, a DHCP client), afsd will be stuck with bogus
information for its entire lifetime and AFSDB will essentially not
work.  This happens particularly often with wireless connections which
require a WEP password which is stored in the keychain of a particular
user (ie not the system keychain).  The network doesn't get brought up
until that user logs in, which happens well after afsd starts.

The fix, given below, is to simply let libresolv double-check its
settings on each AFSDB DNS resolve request.  Tracking down the cause
of this problem was a lot harder than fixing it! :)

I would like to offer this patch for OpenAFS 1.5.x and, if possible,
the 1.4.x branch as well.  I would really appreciate it if this could
go into the stable tree, since it's unlikely to introduce new bugs and
is currently a source of major confusion for new users.

Please let me know what you think, and thanks for considering this.

  - a

cvs diff: Diffing src/afsd
Index: src/afsd/afsd.c
===================================================================
RCS file: /cvs/openafs/src/afsd/afsd.c,v
retrieving revision 1.43.2.18
diff -B -u -b -r1.43.2.18 afsd.c
--- src/afsd/afsd.c     21 Aug 2006 20:39:40 -0000      1.43.2.18
+++ src/afsd/afsd.c     26 Dec 2006 09:10:03 -0000
@@ -1248,6 +1248,11 @@
     acellName[0] = '\0';
 
     while (1) {
+#ifdef AFS_DARWIN_ENV
+      /* libresolv only reads /etc/resolv.conf when this is invoked */
+      res_init();
+#endif
+
        /* On some platforms you only get 4 args to an AFS call */
        int sizeArg = ((sizeof acellName) << 16) | (sizeof kernelMsg);
        code =