[OpenAFS] Re: afs/cell transition procedure

Andrew Deason adeason@sinenomine.net
Fri, 6 Sep 2013 10:54:15 -0500


On Fri, 6 Sep 2013 10:41:50 -0400
Kendrick Hernandez <kendrick.hernandez@umbc.edu> wrote:

> and I was able to generate a new disabled afs/cell principal with
> strong encryption, extract it to the rxkad.keytab file and distribute
> it to our file servers, and do the restarts.

Can you provide the exact commands you used to generate the disabled
principal? What KDC are you using? (just for thoroughness)

You didn't modify or remove the old KeyFile during any of this, correct?

> After this I've noticed the following message repeated in the FileLog
> for our servers:
> 
> VL_RegisterAddrs rpc failed; will retry periodically (code=19270407, err=0)

To be clear, this doesn't involve anything with the KDC. Usually I think
that would be indicative of a mismatch of the keying data between that
fileserver, and the dbservers. I'm not saying you somehow distributed
different keys or something (there's probably something else happening),
but that's just kind of what the servers think happened.

> When I went to enable the new afs/cell principal and disable the old
> one, I was able to log in to a server and get an afs/cell service
> ticket, tokens, and access my afs volume. I could also do the same for
> my afs "admin" principal, but when I went to perform a "vos release"
> operation, I got an error about

I'm not clear about what is happening at this point. Does the above
VL_RegisterAddrs message keep on appearing?

Does "access my afs volume" mean that you were able to do things that
required authenticated access? (e.g. writing to your volume, or in
general accessing files or directories that are not readable to
system:anyuser)

> Could not lock the VLDB entry for the volume XXXXXXXX.
> rxk: security object was passed a bad ticket
> Error in vos release command.
> rxk: security object was passed a bad ticket

This again suggests that the keying material on the dbservers
(specifically the vlserver) is different from the other servers.

In this situation it would be useful to try to see if an authenticated
'vos status' command (or similar fileserver-only 'vos' command) works
against a fileserver. If that works, but authenticated connections to
the vldb do not, then something's wrong with the vlserver keying
material.

It may be helpful to list the contents of the rxkad.keytab on each
server (with MIT ktutil 'list -e'), as well as the contents of the
KeyFile (via 'asetkey list' or 'bos listkeys'). You should be able to
see a mismatch pretty obviously yourself, but if you want, post the
information to the list, with any actual key data removed. Do NOT share
the actual keys; remember that 'asetkey list' does show the actual keys,
so you must scrub the output before sharing it.

Specifically I'm just curious about the kvnos in play, and maybe the
enctypes (for the keys in rxkad.keytab).

> This leads me to believe that our servers are still using the old
> principal.

It suggests to me that your dbserver processes specifically may not be
using the new rxkad.keytab for accepting connections. If you can
authenticate to the fileserver with strong crypto, but not to the vldb,
then that would be explained by the dbservers not having new keys.

> Do I need to restart the afs fileserver processes after enabling the
> new afs/cell principal?

No; you just need to restart after deploying the rxkad.keytab file. And
even if you don't restart, things don't break (as long as you have the
KeyFile around); it just means you're still using the DES long-term
keys, so you still have a security problem. It may be helpful to explain
a little bit about how the server keys are used/updated:

You don't need to restart the server process for that server to accept
incoming connections with the new keying material. That is, if a client,
or another server, tries to contact us with new keys, we don't need to
be restarted. We just need the new keys in rxkad.keytab, and for
CellServDB to be touched.

What we need to restart for is to use the new keys to create outgoing
connections. This is what the servers use to communicate with each
other. 'Why' is beyond the scope of this paragraph, but fileservers do
talk to other fileservers, and fileservers talk to dbservers, and
dbservers talk to each other. Whenever they do that, they need some keys
to make an authenticated connection. If you just update rxkad.keytab,
they will not recreate connections with the new keys; they only load the
keys for creating connections once at startup. There are some exceptions
to that, but for the purposes of this migration, you can treat that as
true.

Does that help? That explanation doesn't explain what your issue is, but
I hope it at least helps to explain what is supposed to occur.

-- 
Andrew Deason
adeason@sinenomine.net