[OpenAFS-devel] ubik administration RPCs

Andrew Deason adeason@sinenomine.net
Thu, 14 Nov 2013 14:08:18 -0600


I just submitted a gerrit change to introduce a new ubik RPC:
<http://gerrit.openafs.org/10461>. The DISK_ service is possibly not the
best place to introduce such an RPC, but it doesn't look like there's an
existing place for it that makes much sense to me. So I thought I'd open
up some discussion about this kind of thing to the list. My
understanding is that this is not a conversation appropriate for
afs3-stds, since this deals with RPCs and such internal to OpenAFS, and
we don't guarantee interop since we're not dealing with clients.

Some background: the motivation for this and the current proposed
ubik_cp tool functionality is the ability to 'read' and 'write' a full
ubik database without needing to stop/shutdown dbservers or worry about
consistency, etc, etc. So we can 'read' a database while the server is
online, and we can perform some analysis on it or transmit it to another
cell or whatever. And, with something like gerrit 10461, we can
'restore' a database if we want to copy a database into the running
machine.

Gerrit 9700 achieved the 'read' functionality by sort-of hijacking an
existing ubik RPC that's used for sending the database data in bulk to
sites that don't have the database. That may not be great way to do it
long-term, but it does work for existing sites with unmodified
dbserveres, which is why it was useful.

There is no way to (for the general case) do the same thing for a
'write' operation, which is why gerrit 10461 exists. However, the new
DISK_RestoreDB RPC is quite different from all of the other DISK_ RPCs
that exist right now. The other DISK_ procs are intended to be used in
an automated way between dbservers to handle transaction and database
modifications inside a transaction, etc. DISK_RestoreDB (as it exists in
gerrit) instead operates at a higher level, creating a transaction
itself, and it's not meant for the other dbserver sites to be running it
as part of normal operation. Instead, it's to be run by an administrator
or by some automated process outside of ubik.

So, to me this sounds like a new service; it's not similar to anything
in VOTE_ or DISK_ (except maybe the debug RPCs), but it needs to run in
the ubik layer, not e.g. the vlserver or ptserver layer, since it
applies to anything using a ubik db. The RPCs in this service would
allow you to consistently read the db, restore the db, maybe enter and
exit a kind of 'read-only' mode... stuff like that.

If such a new service existed, we could then maybe have a new command
suite called 'ubik' (or ubikctl, ubikcmd, etc) that generally calls the
RPCs in that service. I would then use that command for the
functionality I'm currently including in ubik_cp, and drop ubik_cp.

Any thoughts?

-- 
Andrew Deason
adeason@sinenomine.net