[OpenAFS] Re: Fwd: Re: afs/cell transition procedure

Andrew Deason adeason@sinenomine.net
Mon, 9 Sep 2013 09:37:27 -0500


On Mon, 9 Sep 2013 07:10:05 -0400
Kendrick Hernandez <kendrick.hernandez@umbc.edu> wrote:

> > It suggests to me that your dbserver processes specifically may not be
> > using the new rxkad.keytab for accepting connections. If you can
> > authenticate to the fileserver with strong crypto, but not to the vldb,
> > then that would be explained by the dbservers not having new keys.
>
> Ah, okay. I've also noticed that one of our db servers does not appear
> to be synchronizing with the other two. Going back to your previous
> suggestion of attempting "vos status", I re-enabled the new afs/cell
> principal and was able to 'vos status' several of our fileservers. I
> then tried some 'vos listvldb' operations which failed with the "rxk:
> security object was passed a bad ticket" error. On a hunch I shut off
> the server processes for the db server that's not syncing, and this
> time the vos operations worked. Very strange.

Okay, well, if you can narrow it down to a specific machine, of course
that helps :) Can you not find any differences between that machine and
the others? Are they running the exact same binaries, the 3 dbservers?
Any difference in solaris patch levels or anything? Were you by chance
seeing the same error code in the dbserver logs anywhere? (you may not,
even if that error is occurring; some parts of the dbservers in 1.4 do
not have good handling of certain types of errors, but I'm not sure if
it's relevant here)

Another thing to just check is if the server processes are built with
krb5 support. That is a bit new for this, since in the past you have not
needed to do that, and 1.4 did not turn it on by default so it can be
easy to accidentally miss it. You need to explicitly turn it on for 1.4.

A quick check is to run 'ldd /usr/afs/bin/vlserver' (or any other server
binary), and see if it references libkrb5 (and mech_krb5 on Solaris,
iirc). You can also try just 'truss'ing the process to see if it's
actually looking at rxkad.keytab.

However, the 'normal' error in situations like the above should be an
"unknown key" error, not a "bad ticket" error like you're seeing. The
"bad ticket" error is supposed to indicate that the key data is actually
bad / different; I'm not sure how we could get a "bad ticket" error in
those situations, but I'm just trying to cover different possibilities.

> So if when I restarted the servers, the keys in rxkad.keytab were
> disabled (meaning DISALLOW_ALL_TIX set), would they continue to use
> the old key in KeyFile for outgoing connections?

No; the servers don't contact the KDC at all, so they are not (directly)
affected by the flags set on the principal. The servers will use the
keys in rxkad.keytab for server<->server communication as soon as you
restart the servers (assuming rxkad.keytab is present).

Turning off DISALLOW_ALL_TIX just means that clients will start to use
the new principal, so until that flag is unset, servers presumably will
not see any incoming connections that use rxkad.keytab credentials,
except from other servers.

-- 
Andrew Deason
adeason@sinenomine.net