[OpenAFS-devel] Re: rxkad.keytab rotation

Jeffrey Hutzelman jhutz@cmu.edu
Mon, 11 Nov 2013 17:35:19 -0500


On Fri, 2013-11-08 at 11:32 -0500, Benjamin Kaduk wrote:

> Looking at the viced, for example, vl_Initialize() calls ClientAuth and 
> shortly thereafter loops over the vlservers and calls rx_NewConnection on 
> them to pass to ubik_ClientInit.  We could probably through a probe RPC in 
> there and fall back to the previous key if we get the "bad key" error.
> This is a layer where we can conveniently log, so we should be sure to do 
> so if we fall back to an old key.

... and suddenly restarting a fileserver takes several minutes instead
of a couple of seconds, if a dbserver happens to be down.  That's the
difference between an essentially invisible outage and a visible one.
Multiply this by the number of servers, since you have to probe all of
them in order to know whether you can use the new key (or at least,
which key to use for which connections).  This isn't a price/risk that
comes into play only when upgrading, either -- it happens any time you
restart a fileserver.  Of course, it's going to be worse at the times
when the fallback capability is most important, such as when I've been
able to upgrade all of my servers except the one that's broken waiting
on a part that won't be in until next week.

Now, what about volservers?  A volserver has to be able to do RPCs to
any other volserver.  It doesn't even know what servers _exist_ when it
starts up, so it has to do them as it discovers them.


I also really dislike the notion that rekeying is such an exceptional
situation that it requires an administrator to manually keep track of
things and restart servers in a particular order relative to when the
key has been changed and to which servers it has been distributed.  That
makes it hard to deploy a mechanism that rekeys automatically.


If we're going to solve this problem at all, let's figure out how to
solve it right, please.


-- Jeff