[OpenAFS] Problems adding a new server encryption key.
Renata Maria Dart
Renata Maria Dart <renata@SLAC.Stanford.EDU>
Tue, 16 Sep 2003 10:11:04 -0700 (PDT)
Hi, yesterday we attempted to update our AFS server encryption
keys. We have done this procedure a dozen or so times under
Transarc AFS with minimal problems. Yesterday was our first time
trying it under OpenAFS and things did not proceed as expected.
All of our database (3) and fileservers (8) are solaris 9 running
OpenAFS 1.2.9. We always keep 2 server keys in our KeyFile and we
change them every 6-8 months. During the update process we retire
the key that was current 6 months ago. According to the Transarc
documentation of this procedure, the first steps, which involve
updating the KeyFile, should not require a restart of any server
processes. Only when you update the afs entry in the auth database
is a restart of the db servers needed. Yesterday I did the following:
1. I did a bos remove of the lowest numbered key in the KeyFile on
the db server which runs upserver of /usr/afs/etc, leaving only
one key. The one key left in the KeyFile matched the afs entry
in the auth database. I watched with bos listkeys as this change
propagated to our other servers. No problems yet.
2. I generated a new random key and used bos addkey to add it to the
KeyFile on that same server. As soon as I did this, messages like:
Lost contact with file server 134.79.17.xx in cell slac.stanford.edu (all
multi-homed ip addresses down for the server)
began appearing in our SYSLOG output. I watched as each of our
fileservers in turn stopped serving files, as each one got the new copy
of the KeyFile.
While clients could no longer see files in AFS, I could successfully
talk to the db server processes with commands like vos listvldb,
kas exam, and pts exam, and the fileservers would respond to
commands like vos listvol and vos partinfo. I could also klog
and get a new token.
At this point we spent some time speculating about what had happened and
how to fix it. We were concerned that the new key might have corrupted
the KeyFile, despite the fact that bos listkeys always produced normal
output that matched on all of the servers. So we backed out the new
key, using bos removekey, leaving only the one "current" key which
matched the afs entry in the auth database. That didn't help. What
finally fixed the situation was to restart all of the db servers,
and then restart all of the fileservers.
My questions are:
1. Is the Transarc procedure for updating server keys supposed to
work under OpenAFS? Or is a restart of the db and fileservers
now needed after a new key is added to the KeyFile? After the
incident described above we went through the archives and found
mail from Derrick Brashear in response to Frederick Gilbert:
http://www.mail-archive.com/openafs-info@openafs.org/msg07515.html
in which a "stuck fileserver" situation is described, but in that
case it was after the bos addkey AND kas setpasswd had both been
done. In our case, I never got to the kas setpasswd step.
And Nathan Neulinger asked some questions about this process in:
https://lists.openafs.org/pipermail/openafs-devel/2001-January/000480.html
I couldn't find a response to Nathan and I couldn't find an open
bug that might be related to Frederick Gilbert's mail.
2. If a restart is now necessary, is there some subset of processes
that would suffice rather than restarting all of our servers using
-bosserver, which is what I did.
3. If we now need to restart the servers after a bos addkey, can you
tell us why?
4. Could the KeyFile have been corrupted and still present a normal
response with bos listkeys?
Thanks for your help,
-Renata
Renata Dart | renata@SLAC.Stanford.edu
Stanford Linear Accelerator Center |
2575 Sand Hill Road, MS 97 | (650) 926-2848 (office)
Stanford, California 94025 | (650) 926-3329 (fax)