[OpenAFS-devel] rxkad.keytab rotation

Tue, 22 Oct 2013 12:41:13 -0500

With the rxkad-k5 hubbub, some sites are starting to pay a little more
attention to openafs security. I've been asked about rotating cell keys
regularly, and how it appears that this isn't possible to do robustly
(without some hacky workarounds). In order to provide a way to do this,
we need to introduce some kind of new config file or directive of some
kind, so I am soliciting opinions on how to handle this going forward.

The problem is as thus. Say someone wants to change the cell key and
distribute it amongst the cell in rxkad.keytab. So we do something like
this:

 1. We have an existing rxkad.keytab with keying material for kvno 2.
 Generate a new rxkad.keytab with keying material for kvnos 2 and 3.
 (This is without contacting the KDC; the kerberos tools for doing this
 also aren't great, but this is possible, so for some that's "good
 enough".)

 2. Distribute the new rxkad.keytab to each fileserver and dbserver in
 the cell. Touch CellServDB, so they will re-read the keys and accept
 connections for kvnos 2 or 3.

 3. Change the key in the KDC, so the kvno is bumped to 3, and the
 keying material matches what we have in kvno 3.

 4. At some point, restart the servers so they use kvno 3 for outgoing
 connections.

 5. After waiting for kvno 2 tokens to expire, remove kvno 2 from
 rxkad.keytab, distribute it, and touch CellServDB.

So, looks like that would work, right? And for most situations, that
would probably work without issue. However, during step 2, if any of the
servers are restarted for any reason after the new rxkad.keytab is
deployed (deliberately restarted, segfault, machine loses power, etc),
they will come back up using kvno 3 for outgoing connections. But
rxkad.keytab may not have been deployed "everywhere" yet, so we may try
to use kvno 3 for an outgoing connection, and the server we're
contacting doesn't have the key for kvno 3, so we fail.

The problem here is arguably that the administrator has no way of
specifying which key to use for outgoing connections (we always try to
use the "most recent" one, though iirc that request is not always
honored by the krb5 library, e.g. some heimdal versions). Or, the
problem is arguably that we use the same keytab for incoming and outoing
connections.

One way to get around this is the "hacky workaround" I mention above.
The way I was thinking of is that if you temporarily copy kvno 2 to,
say, kvno 50, the server will use kvno 50 for outgoing connections even
if we restart in the middle of deploying rxkad.keytab. This is a way of
sort-of saying "use this key for outgoing connections", but this
arguably is an abuse of the kvno field, and of course lying about the
key's kvno can get confusing really quickly.

So anyway, a few ways to more robustly solve this. I'll give them
identifiers as I sometimes do so they can be referred to unambiguously.

 A) One way is that we have some configuration file or configuration
 directive somewhere that just says which key to use for outgoing
 connections. If the file/directive does not exist, we fall back to the
 current "best key" behavior. This could be a directive in the master
 branch config file; of course, that limits this to versions that has
 the config file code, but that may be desirable. Or of course we can
 just have another weirdly-named file in /usr/afs/etc, like KeyVno or
 something. This should probably be more than just a kvno, though, since
 we have at least 3 different sources of keys (KeyFile, rxkad.keytab,
 and KeyFileExt).

 B) Another way to solve this is to have separate files for incoming and
 outgoing connections. That is, KeyFile/rxkad.keytab/KeyFileExt are only
 used for decoding incoming connections, and we have an optional
 separate file (say, "KeyFileOutgoing"). Maybe for that we could use the
 KeyFileExt format, and we just choose the highest-numbered key.

 C) Or, we could embed this information in our KeyFile-ish file. That
 is, modify KeyFileExt to either have a field saying "use kvno X", or
 have a per-key field that says "use this key". I don't think the
 current KeyFileExt file format has any way of including information
 like this, so we'd need to change the format (and the filename). The
 administrator would need to use some openafs commands to view or alter
 this information.

With any of these, the idea is that the administrator can control what
key is used for outgoing connections in the server. So, with the above 5
step process, we force the server to use the old kvno 2 key until we are
sure the new kvno 3 key is distributed everywhere, and only after that
point do we allow the servers to use the kvno 3 key for outgoing
connections. Even if a server is restarted in the middle of those
procedures, it should be impossible to use the wrong key.

While writing this email, I've been thinking that B may be easiest, at
least from a code-wise perspective. Since for creating localauth-y
client connections, we already need a code path for "try file X, if that
doesn't exist try Y". So, using a file like KeyFileOutgoing just means
that we add another possible file to try in front of all of the others.

-- 
Andrew Deason
adeason@sinenomine.net