[OpenAFS-devel] Re: rxkad.keytab rotation

Fri, 8 Nov 2013 11:32:42 -0500 (EST)

On Wed, 23 Oct 2013, Andrew Deason wrote:

> On Wed, 23 Oct 2013 10:12:44 -0400
> Chaskiel Grundman <cg2v@andrew.cmu.edu> wrote:
>
>> This seems rather more heavyweight (at least from the perspective of
>> implementing it in openafs), since it would involve inserting some
>> sort of loop around client rpcs in every server and any client that
>> uses localauth,
>
> Yeah, it also struck me as way more complex from a coding perspective
> than any of the other options, though from a user's perspective it's
> clearly much simpler.
>
> I'm not sure if we need to check the results and possibly reinit for
> every single RPC, though. I _think_ we can issue some low-impact "probe"
> RPC during startup to verify the connection is okay (I think all of the
> relevant services have an existing RPC that is suitable), and negotiate
> the key to use using that. That is a lot easier, though obviously it
> doesn't handle any such errors if they are encountered after startup.

I may be a little confused, but we're only talking about the places in the 
codebase which are currently calling afsconf_ClientAuth(), right? 
(Ignoring 1.4.)  My understanding was that once credentials are printed 
with those routines, they are used for all (*) subsequent outgoing 
connections until the process is restarted.  As such, any current 
key-rotation procedure is going to require restarting all server processes 
along the way, in order to completely remove the old key.

This would seem to indicate that using a "probe" RPC during startup (right 
after getting the printed credentials) would be consistent with the 
current state of affairs, and that checking every RPC would not be 
necessary.

Now, we would need to use more than one RPC, in order to check all the 
(db)servers we might end up wanting to talk to, but that's still probably 
not too bad, given that we have a cap on the number of dbservers.

Looking at the viced, for example, vl_Initialize() calls ClientAuth and 
shortly thereafter loops over the vlservers and calls rx_NewConnection on 
them to pass to ubik_ClientInit.  We could probably through a probe RPC in 
there and fall back to the previous key if we get the "bad key" error.
This is a layer where we can conveniently log, so we should be sure to do 
so if we fall back to an old key.

> I think that allows for a "robust" rotation procedure. It gets a little
> more difficult when you want to revoke the "old" key, since it's
> difficult to tell what key we're using for outgoing connections. If we
> have the 'new' and 'old' keys in our rxkad.keytab, but we fell back to
> the 'old' key for outgoing conns for any reason, obviously we'll fail
> once the 'old' key is revoked on the remote side, and we need to
> reconnect for any reason.

I'm fine with logging when we use an old key (or even just logging what 
key we are using) and making the administrator restart server processes 
using the old key, before removing the old key.

> That's either solved by checking every RPC return code as you mentioned,
> or we could just log which key we're using for outgoing connections, and
> have "proper procedure" say to have the administrator sanity check that
> the log says the expected kvno.
>
> A kind of "middle ground" may be to check all RPC return codes in
> servers (where we already have a bit better error handling than just
> "print out an error message and exit"), but for command-line utilities
> with -localauth, we could just use some "probe" RPC during
> initialization.

I guess I still don't see why checking all RPC return codes is necessary 
-- shouldn't probe RPCs at startup be sufficient?

I think I will have some time to help implement such a solution, if 
desired.

-Ben

(*) I seem to recall a couple of places where verious ubik recovery 
scenarios could lead to refreshing credentials, but I think these are 
rare.