OpenAFS Master Repository branch, master, updated. openafs-devel-1_9_2-127-g877d751
Gerrit Code Review
gerrit@openafs.org
Fri, 7 Feb 2025 13:45:01 -0500
The following commit has been merged in the master branch:
commit 053659cab31bc78e0a80986e970c44adb51c51e0
Author: Andrew Deason <adeason@sinenomine.net>
Date: Wed Jan 29 22:23:56 2025 -0600
viced: Exit on InitPR() failure
Since commit 5b6e501945 (make fileserver avoid salvage loop on init
failure), included in OpenAFS 1.4.12 and 1.5.63, the fileserver no
longer exits when InitPR() fails with specifically code -1. InitPR() can
fail with -1 if pr_Initialize() fails for a variety of reasons, or our
call to pr_GetCPS(ANONYMOUSID) fails for any reason. If either of these
happen, the fileserver will continue to startup normally, but various
InitPR()-related things will not be initialized, leading to bizarre
behavior.
If the pr_Initialize() call fails, then viced_uclient_key will not be
initialized. The first time a client accesses the fileserver, we'll
likely segfault during any hpr_* call (such as hpr_GetHostCPS), after
getThreadClient() returns a bogus ubik_client from a call to
pthread_getspecific(viced_uclient_key). We'll also log this message
during startup, but it's easy to miss in the middle of many other
messages:
Couldn't initialize protection library; code=-1.
If the call to pr_GetCPS(ANONYMOUSID) fails, the 'AnonCPS' and
'AnonymousID' globals will be unset. This causes unauthenticated client
connections to get treated like an authenticated user with viceid 0
(since AnonymousID is 0 instead of the actual anonymous viceid), causing
many FileLog messages like:
pr_GetCPS failed(267268) for user 0, host 0x[...] ([...]:7001)
And we'll log this message during startup (also easy to miss, and we
don't actually exit afterwards):
Couldn't get Anonymous CPS, exiting; code=-1.
It is unlikely for InitPR() to fail in such a way, since pr_Initialize()
failing usually means the local configuration is broken (and other
things would also fail), and we only reach our call to
pr_GetCPS(ANONYMOUSID) if our first call to pr_GetCPS(SystemAnyUser)
succeeds. But these are both possible if something is modifying the
local configuration while the fileserver is starting up, or if the
ptserver suddenly becomes unreachable bewteen the two pr_GetCPS calls.
Commit 5b6e501945 mentions that the change was made to allow for the
fileserver to startup if we fail to reach dbservers and we can retry in
the background. But only the vlserver-related InitVL() has any logic for
retrying in the background; we can't really process any requests without
successfully contacting the ptserver, since we need to know the CPS for
system:anyuser. (For InitVL(), we can still serve requests and register
ourselves with the vldb later on.)
So, to avoid these unlikely instances of odd behavior, revert the InitPR
portion of commit 5b6e501945, and make the fileserver exit if InitPR
fails with any code.
Change-Id: Ide2b8bed9d30c2a7aebda5df6685654a83c8fe8a
Reviewed-on: https://gerrit.openafs.org/16222
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
src/viced/viced.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--
OpenAFS Master Repository