[OpenAFS-devel] code to "down" unreachable hosts in src/rx/rx_packet.c is unreachable?

Adam Megacz megacz@cs.berkeley.edu
Sun, 09 Nov 2008 12:52:26 -0800


I noticed that the following useful piece of code at
src/rx/rx_packet.c:2431 (CVS HEAD).  I could be mistaken, but it
appears that the body of the if-block is unreachable on
non-AFS_NT40_ENV platforms:

            /* Some systems are nice and tell us right away that we cannot
             * reach this recipient by returning an error code.
             * So, when this happens let's "down" the host NOW so
             * we don't sit around waiting for this host to timeout later.
             */
            if (call &&
#ifdef AFS_NT40_ENV
                code == -1 && WSAGetLastError() == WSAEHOSTUNREACH
#elif defined(AFS_LINUX20_ENV) && defined(KERNEL)
                code == -ENETUNREACH
#elif defined(AFS_DARWIN_ENV) && defined(KERNEL)
                code == EHOSTUNREACH
#else
                0
#endif
                )

The variable "code" is the return value from osi_NetSend() on line
2421, a function which simply relays the return value of
rxi_Sendmsg().  Both the lwp and pthread implementations of
rxi_Sendmsg() appear to return only 0 or -1, never an error code.

If this is not the intended situation, I'm interested in investing a
bit of time in writing up a patch to change it and expand it to
include the !defined(KERNEL) case (for Linux, at minimum).  I would
welcome any advice on how the patch should be written.

In particular, libnss_afs needs some sort of "fast fail" behavior like
this during client shutdown, in case some process attempts an NSS
request after the network interfaces have been taken down.  Even with
rx_SetRxDeadTime(/*small-value*/), the ubik/rx libraries will still
make several attempts to contact each ptserver rather than immediately
taking advantage of ENETUNREACH.  It seems like the code to do this is
there, it just isn't getting used.

Thanks,

  - a