[OpenAFS-devel] prdb format extension for extended authentication names

Benjamin Kaduk kaduk@MIT.EDU
Fri, 28 Jun 2013 14:08:38 -0400 (EDT)


On Wed, 26 Jun 2013, Jeffrey Hutzelman wrote:

> On Wed, 2013-06-26 at 12:58 -0400, Benjamin Kaduk wrote:
>> On Wed, 26 Jun 2013, Simon Wilkinson wrote:
>>
>>>
>>> On 25 Jun 2013, at 00:57, Benjamin Kaduk wrote:
>>
>> DBVERSION to 1?  We would need to resolve the details of how an update
>> would occur, if we don't want to require simultaneous update of the
>> software on all dbservers, of course.
>
> Yeah, this is a real problem.  The PRDB is a distributed database so I
> can run three of them and not have to take user-visible outages.  Being
> forced to do so to take an update would not be OK.

As a cell admin, I am certainly not excited about full downtime for an 
update.  It does seem orthogonal to the actual on-disk format of the new 
database version, though.

> However, I think we can go back to what we were discussing earlier.  To
> bump to a new database version, you must first upgrade all servers to
> something new enough to support that version.  Then you take an action
> (presumably, make an RPC) that bumps the version.  The following

Yes, I think it should work.  Of course, now we have the problem of coming 
up with a name for the RPC...

> restrictions apply...
>
> 1) A server (coordinatoir) will refuse to upgrade the database version
> to a version it doesn't support.  So, you can't upgrade to a db version
> unless you have at least one server that supports that version.
>
> 2) A server will refuse to function if the database version is not one
> that it supports.  This is already true -- Initdb() fails with PRDBBAD
> if the database version is not recognized.  Note that the server will
> continue to _run_, which means you can have enough servers to maintain a
> quorum, even if some of them cannot provide service.

I had missed that the server would continue to run.

>>>> I guess we could steal a couple of words in the header to indicate (respectively) "feature supported/enabled by the running quorum" and "feature in use in this database".
>>>
>>> The problem is that you don't know the feature set supported by the
>>> running quorum. Only the master can write to the database - so even if
>>> that updates the database with its feature set every time it is
>>> restarted, the slaves get no say. One of the challenges with Ubik is
>>> that there is currently no mechanism to do configuration negotiation
>>> during a quorum election. So there's no way to notice during that
>>> election that a slave's configuration is non-standard.
>>
>> Perhaps I am confused, but I was expecting that:
>> (1) something (perhaps administrator action) causes the master dbserver to
>> decide that it should try to enable a feature flag.
>> (2) something (maybe the administrator, maybe the master dbserver itself)
>> calls an RPC against each server in the quorum, individually, to query
>> support for that feature.
>> (3) If all servers in the quorum report success, the master dbserver makes
>> a write indicating that the feature is enabled
>> (4) subsequent slave dbservers attempting to join that do not support the
>> indicated feature notice that the flag is set and decline to join the
>> quorum.
>
> You're not confused.  That's exactly what I expected -- the feature
> flags describe features that are used in the database, not features that
> some server supports.  In other words, they work just like the feature
> flags in a filesystem, except that you have to be more careful about
> turning one on, since there are multiple servers using the same database
> at once.

I think we may be talking slightly past each other; I had two bits for 
each flag (in different words) -- one is written at step (3), and the 
second is written when that feature is actually *used* (e.g., when the 
first extended name or hash table is added, or the first supergroup entry, 
etc.).  I'm not sure whether you're thinking of the first or the second 
one, or lumping them together.  (The second one is only really needed if 
we want to have the ability to live-rollback a version upgrade if the new 
feature is not actually used, which was mentioned earlier in the thread.)

> I was expecting (1) to be done manually, per feature, and (2) to be
> handled automatically by the server doing the upgrade.  Note that what
> actually happens in (4) is what I described above -- the server _does_
> join the quorum, but declines to process any RPCs that require the
> database.

I was also expecting (2) to be handled automatically, but it occured to me 
that perhaps an administrator might want to run the check themself and 
overrule the system, so I added the other option.  It might be a bad idea 
if it would leave a slave running which doesn't know about the new format; 
I'm not sure I traced through Initdb() and {Write,Read}Preamble properly 
to understand the "keeps running for quorum purposes" behavior.  It 
*looks* like Initdb() is called before any write or read and would prevent 
that slave from giving useful responses, but what the failover behavior of 
clients would be, etc....

>> Some people in this thread were advocating that, going forward, running
>> dbserver software always know about supergroups (as opposed to being able
>> to compile away support for them).  This is a behavior change, and I am
>> not thinking about what its details should be or how to implement it,
>> right now.
>
> That was Simon, and he's right.  We've effectively already revved the
> database version without actually bumping the version number.  You can
> decline to support supergroup semantics, but the database fields have
> been used, and they record relationships.  So, it's not safe for a
> database with supergroups data to be modified by a server that doesn't
> have that data.  The solution is that, going forward, we should remove
> the option to build a dbserver that does not understand this format.

And then provide a flag to indicate whether it is allowed to be used?
We would need to be clever to avoid breaking people who currently have it 
enabled while not force-enabling it on all sites.  (Part of why I didn't 
want to think about it yet.)

-Ben