OpenAFS Master Repository branch, master, updated. openafs-devel-1_9_2-127-g877d751

Gerrit Code Review gerrit@openafs.org
Fri, 7 Feb 2025 13:45:01 -0500


The following commit has been merged in the master branch:
commit 053659cab31bc78e0a80986e970c44adb51c51e0
Author: Andrew Deason <adeason@sinenomine.net>
Date:   Wed Jan 29 22:23:56 2025 -0600

    viced: Exit on InitPR() failure
    
    Since commit 5b6e501945 (make fileserver avoid salvage loop on init
    failure), included in OpenAFS 1.4.12 and 1.5.63, the fileserver no
    longer exits when InitPR() fails with specifically code -1. InitPR() can
    fail with -1 if pr_Initialize() fails for a variety of reasons, or our
    call to pr_GetCPS(ANONYMOUSID) fails for any reason. If either of these
    happen, the fileserver will continue to startup normally, but various
    InitPR()-related things will not be initialized, leading to bizarre
    behavior.
    
    If the pr_Initialize() call fails, then viced_uclient_key will not be
    initialized. The first time a client accesses the fileserver, we'll
    likely segfault during any hpr_* call (such as hpr_GetHostCPS), after
    getThreadClient() returns a bogus ubik_client from a call to
    pthread_getspecific(viced_uclient_key). We'll also log this message
    during startup, but it's easy to miss in the middle of many other
    messages:
    
        Couldn't initialize protection library; code=-1.
    
    If the call to pr_GetCPS(ANONYMOUSID) fails, the 'AnonCPS' and
    'AnonymousID' globals will be unset. This causes unauthenticated client
    connections to get treated like an authenticated user with viceid 0
    (since AnonymousID is 0 instead of the actual anonymous viceid), causing
    many FileLog messages like:
    
        pr_GetCPS failed(267268) for user 0, host 0x[...] ([...]:7001)
    
    And we'll log this message during startup (also easy to miss, and we
    don't actually exit afterwards):
    
        Couldn't get Anonymous CPS, exiting; code=-1.
    
    It is unlikely for InitPR() to fail in such a way, since pr_Initialize()
    failing usually means the local configuration is broken (and other
    things would also fail), and we only reach our call to
    pr_GetCPS(ANONYMOUSID) if our first call to pr_GetCPS(SystemAnyUser)
    succeeds. But these are both possible if something is modifying the
    local configuration while the fileserver is starting up, or if the
    ptserver suddenly becomes unreachable bewteen the two pr_GetCPS calls.
    
    Commit 5b6e501945 mentions that the change was made to allow for the
    fileserver to startup if we fail to reach dbservers and we can retry in
    the background. But only the vlserver-related InitVL() has any logic for
    retrying in the background; we can't really process any requests without
    successfully contacting the ptserver, since we need to know the CPS for
    system:anyuser. (For InitVL(), we can still serve requests and register
    ourselves with the vldb later on.)
    
    So, to avoid these unlikely instances of odd behavior, revert the InitPR
    portion of commit 5b6e501945, and make the fileserver exit if InitPR
    fails with any code.
    
    Change-Id: Ide2b8bed9d30c2a7aebda5df6685654a83c8fe8a
    Reviewed-on: https://gerrit.openafs.org/16222
    Tested-by: BuildBot <buildbot@rampaginggeek.com>
    Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
    Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
    Reviewed-by: Andrew Deason <adeason@sinenomine.net>

 src/viced/viced.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
OpenAFS Master Repository