[OpenAFS-devel] making supergroups the default and removing the option?

Jeffrey Hutzelman jhutz@cmu.edu
Mon, 20 Aug 2007 19:38:15 -0400


On Sunday, August 19, 2007 10:16:17 PM -0400 Marcus Watts 
<mdw@spam.ifs.umich.edu> wrote:

> Running mixed supergroup & non-supergroup db servers probably won't
> destroy your cell, but it will cause weird schizoid behavior.

Hrm.  I went through the code looking at this, attempting to find places 
where database corruption would result from updates made by a coordinator 
which doesn't understand the new database format.  The conclusion I came to 
is that in most cases, attempting such updates will result in an error with 
no ill effects, because things like removing groups and group memberships 
actually check the back-references, which for a group-in-group relationship 
are kept in a different list unknown to the non-SG-aware server.  The 
result is that removing a group-in-group relationship or the containing 
group in such a relationship will fail with PRNOENT.  The same is true for 
changing the ID of the containing group.

However, removing the _contained_ group is a different story.  If a 
non-SG-aware server tries to remove a group which is contained within some 
other group, but does not itself contain any groups, it will think it has 
done so successfully.  However, in reality the groups containing it will 
not have been updated, resulting in a dangling "pointer" (really an entry 
ID) in the containing group.  The result is that, even with an SG-aware 
server, the containing group can never be removed, and the relationship 
between those two group ID's cannot ever be reestablished.

The situation for changing the ID of the contained group is in some ways 
not as bad, and in other ways worse.  Without supergroups, changing the ID 
of a group which contains other groups will corrupt the database, the same 
as described above.  However, the situation can be repaired simply by 
changing the ID back.  The "worse" part is that even _with_ supergroups, 
the behavior is exactly the same -- ChangeEntry() has no handling for 
changing the ID of a group whose supergroups list is non-empty!

For that matter, there is very little handling of changing the ID of a 
group which contains other groups.  If ChangeEntry encounters this 
situation, it returns a mysterious error code.  It would be better if it 
returned a real code, and better still if it actually just did the required 
work, which is not too hard.


> You should probably have a copy of db_verify & pt_util when you make this
> change.  pt_util is probably built and installed by default.  You may have
> to build db_verify specially.

The PRDB verifier is installed in the server sbin directory as prdb_check.



> This may all seem like supergroups are complicated & hard to deploy.

Nope, not at all.  What's complicated is recovering from what happens if 
you run a mixture of incompatible database servers, which has never been a 
good idea.  We've been fortunate in OpenAFS in that the database version 
formats have not changed in quite a long time; however, that will not be 
true forever.

As you mentioned, there are already improvements on the horizon which will 
require changes to both the PRDB and VLDB.  In both cases, full backward 
compatibility is impossible, but we'd like to make it at least as good as 
the supergroups change; specifically, old servers will be able to safely 
read any entry and modifiy entries which don't use new features, but won't 
be able to use new features and may be unable to modify entries which use 
them.

Hopefully, this will be the last time a version change introduces 
uncertainty or "danger".  We should be able to make it so that a server can 
detect the presence of extensions it doesn't support and fail gracefully, 
rather than corrupting the database.  And, we should be able to make it 
possible to update one server at a time, while not bringing new features 
into use until all servers, or at least all voting servers, are up to date. 
In fact, I just had some thoughts along those lines, which I need to flesh 
out a bit more.

> Or,
> here's a less obvious incomaptible db change: when do you want ubik to
> not use rxkad/des between servers?

Eh; that's not nearly as hard.  The mechanisms don't have to interopeate 
with each other; you just have to be able to configure which mechanisms are 
supported and which ones to use to talk to a ubik peer.  That's something 
we can drop into ubik config improvements pretty easily (another thing I 
should spend some of my CFT on).


-- Jeff