OpenAFS Master Repository branch, master, updated. openafs-devel-1_9_2-84-ge022864
Gerrit Code Review
gerrit@openafs.org
Tue, 21 Jan 2025 15:08:15 -0500
The following commit has been merged in the master branch:
commit e022864c923d4936ed68d01d89eef1df19bbe303
Author: Andrew Deason <adeason@sinenomine.net>
Date: Thu Oct 17 00:13:23 2024 -0500
volser: Avoid 'vos release' failure partial unlock
Currently, if UV_ReleaseVolume() in 'vos release' fails to release the
volume to some sites, we update the vlentry for the volume by calling
VLDB_ReplaceEntry with LOCKREL_TIMESTAMP. This clears the lock timestamp
in the vlentry, but not the lock flags (VLOP_RELEASE). We then fully
unlock the vlentry later during cleanup after the 'rfail' label by
calling ubik_VL_ReleaseLock().
The timestamp-only unlock via VLDB_ReplaceEntry is unusual; all other
code paths in OpenAFS clear a vlentry lock by clearing all lock-related
fields at the same time (LOCKREL_OPCODE | LOCKREL_AFSID |
LOCKREL_TIMESTAMP).
If only the lock timestamp is cleared and specifically the VLOP_RELEASE
flag is set, future calls to VL_SetLock() cause the vlserver to return a
different error code: VL_RERELEASE instead of VL_ENTRYLOCKED. If
UV_ReleaseVolume() sees VL_RERELEASE, we act as if the vlentry was
locked successfully; the intention was probably to allow a subsequent
'vos release' to fix the partially-failed release.
This scheme has a few problems:
If a volume is partially locked, multiple 'vos release' processes could
see the VL_RERELEASE error for the same volume, and all of them would
assume they have locked the vlentry. This would bypass the exclusion
that vlentry locks normally provide, and allows multiple processes to go
through the 'vos release' process for the same volume at the same time,
causing various problems.
Similarly, a second 'vos release' process could see the VL_RERELEASE
error while a first 'vos release' process is still running its cleanup.
The first process would then fully unlock the vlentry while the second
process is running the release, causing the release to run with a fully
unlocked volume. This would allow any 'vos' operation against the volume
to run at the same time as the release, potentially causing various
problems.
Furthermore, it's rare for a vlentry to be left in this partially-locked
state. Since UV_ReleaseVolume() fully unlocks the vlentry during
cleanup, the 'vos release' process would need to die before running the
last ubik_VL_ReleaseLock(), or the last ubik_VL_ReleaseLock() would need
to fail. Even if the VL_RERELEASE scheme worked well, it almost always
is not used, since the vlentry is usually left completely unlocked after
'vos release' runs, even after a failure.
It's possible that the details of this scheme used to work differently,
perhaps avoiding the full unlock after a partial release failure.
However, the current behavior has existed for quite a long time (since
before OpenAFS 1.0), so it seems unlikely that anyone still expects
whatever the original behavior was.
To avoid all of this nonsense, just don't pass LOCKREL_TIMESTAMP during
the failure-related VLDB_ReplaceEntry() call. Instead, the vlentry
remains locked until we fully unlock it during the normal cleanup
processing.
Do the same for the equivalent code path in libadmin's
UV_ReleaseVolume(). After this commit, all callers that pass any
LOCKREL_* flags pass all of them at the same time, so partial unlocks
should no longer be possible.
For now, don't remove the special handling of the VL_RERELEASE error
code after trying to lock a vlentry. Maybe this can be removed in the
future, but for now, keep this to retain existing behavior if there are
existing partially-locked entries in the VLDB.
Thanks for mvitale@sinenomine.net for related work drawing attention to
this.
Change-Id: I6b053f88c923684bd95df3e2022e9cf125e45d04
Reviewed-on: https://gerrit.openafs.org/15867
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
src/libadmin/vos/vsprocs.c | 2 +-
src/volser/vsprocs.c | 4 +---
2 files changed, 2 insertions(+), 4 deletions(-)
--
OpenAFS Master Repository