[OpenAFS-devel] making supergroups the default and removing the
option?
Jeffrey Hutzelman
jhutz@cmu.edu
Mon, 20 Aug 2007 19:38:15 -0400
On Sunday, August 19, 2007 10:16:17 PM -0400 Marcus Watts
<mdw@spam.ifs.umich.edu> wrote:
> Running mixed supergroup & non-supergroup db servers probably won't
> destroy your cell, but it will cause weird schizoid behavior.
Hrm. I went through the code looking at this, attempting to find places
where database corruption would result from updates made by a coordinator
which doesn't understand the new database format. The conclusion I came to
is that in most cases, attempting such updates will result in an error with
no ill effects, because things like removing groups and group memberships
actually check the back-references, which for a group-in-group relationship
are kept in a different list unknown to the non-SG-aware server. The
result is that removing a group-in-group relationship or the containing
group in such a relationship will fail with PRNOENT. The same is true for
changing the ID of the containing group.
However, removing the _contained_ group is a different story. If a
non-SG-aware server tries to remove a group which is contained within some
other group, but does not itself contain any groups, it will think it has
done so successfully. However, in reality the groups containing it will
not have been updated, resulting in a dangling "pointer" (really an entry
ID) in the containing group. The result is that, even with an SG-aware
server, the containing group can never be removed, and the relationship
between those two group ID's cannot ever be reestablished.
The situation for changing the ID of the contained group is in some ways
not as bad, and in other ways worse. Without supergroups, changing the ID
of a group which contains other groups will corrupt the database, the same
as described above. However, the situation can be repaired simply by
changing the ID back. The "worse" part is that even _with_ supergroups,
the behavior is exactly the same -- ChangeEntry() has no handling for
changing the ID of a group whose supergroups list is non-empty!
For that matter, there is very little handling of changing the ID of a
group which contains other groups. If ChangeEntry encounters this
situation, it returns a mysterious error code. It would be better if it
returned a real code, and better still if it actually just did the required
work, which is not too hard.
> You should probably have a copy of db_verify & pt_util when you make this
> change. pt_util is probably built and installed by default. You may have
> to build db_verify specially.
The PRDB verifier is installed in the server sbin directory as prdb_check.
> This may all seem like supergroups are complicated & hard to deploy.
Nope, not at all. What's complicated is recovering from what happens if
you run a mixture of incompatible database servers, which has never been a
good idea. We've been fortunate in OpenAFS in that the database version
formats have not changed in quite a long time; however, that will not be
true forever.
As you mentioned, there are already improvements on the horizon which will
require changes to both the PRDB and VLDB. In both cases, full backward
compatibility is impossible, but we'd like to make it at least as good as
the supergroups change; specifically, old servers will be able to safely
read any entry and modifiy entries which don't use new features, but won't
be able to use new features and may be unable to modify entries which use
them.
Hopefully, this will be the last time a version change introduces
uncertainty or "danger". We should be able to make it so that a server can
detect the presence of extensions it doesn't support and fail gracefully,
rather than corrupting the database. And, we should be able to make it
possible to update one server at a time, while not bringing new features
into use until all servers, or at least all voting servers, are up to date.
In fact, I just had some thoughts along those lines, which I need to flesh
out a bit more.
> Or,
> here's a less obvious incomaptible db change: when do you want ubik to
> not use rxkad/des between servers?
Eh; that's not nearly as hard. The mechanisms don't have to interopeate
with each other; you just have to be able to configure which mechanisms are
supported and which ones to use to talk to a ubik peer. That's something
we can drop into ubik config improvements pretty easily (another thing I
should spend some of my CFT on).
-- Jeff