[OpenAFS] Changes for Mosaic's AFS cell...

Jeffrey Hutzelman jhutz@cmu.edu
Thu, 06 Apr 2006 02:06:08 -0400


On Wednesday, April 05, 2006 11:54:55 PM -0500 "Christopher D. Clausen" 
<cclausen@acm.org> wrote:

> Rodney M Dyer <rmdyer@uncc.edu> wrote:
>> Does it matter whether the cell servers are upgraded
>> first?  Obviously not, since our existing test server already works.
>> I've never upgraded a cell server myself, and the person who last
>> upgraded our cell servers has "left the building".
>
> Other than having servers that support whatever features you need, no,
> it shouldn't matter.  I had a problem with "foreign users" from other
> Kerberos realms not working b/c my servers were too old.  If you are not
> intending to use any new functionality you should be fine upgrading
> either the clients or the servers in any order.

Unless you're upgrading from something really ancient (pre-OpenAFS), that's 
basically true.

> I would recomend doing server upgrades as close as possible though to
> minimize problems that may occur.

FUD.  Mixed-server-version cells work just fine.  There are situations in 
which you want to be careful about mixing database server versions, such as 
when the database format changes, but that hasn't happened in a long time 
(again, since before OpenAFS).  We expect such format changes in the future 
(perhaps around the 2.0.x timeframe), but I'd be very surprised and upset 
if they came in the middle of the 1.4.x branch, and what design work has 
been done in this area so far has paid at least some attention to backward 
compatibility.


>> Our current
>> back-end systems guy just wanted some indication about the sequence
>> of events in which things should take place.  Because of issues with
>> the UBIK quorum, if no accounts, or volumes are being added, removed,
>> or replicated during an upgrade, is the sequence of cell server
>> upgrades important?  I mean our cell is fairly small so can we just
>> upgrade each one without worry right?
>
> I've updated to various dev builds and rc versions in random order, one
> AFS server at a time, on the three AFS DB servers for the acm.uiuc.edu
> cell without issues (at least not issues related to upgrade order.)
> Just vos move all volumes off, shut it down, do whatever system upgrades
> at the same time, and restart with the newer version.

Yeah, for fileservers, that's pretty much it.  If you're willing to 
tolerate a service outage while you upgrade things, you can even skip the 
move-the-volumes-off step.  Personally, I prefer not to have service 
outages.

Note that while ordinary users never perform operations requiring VLDB 
quorum, they do make changes in the PRDB, so you may want to minimize 
outages of that service.

> I will say that 1.4.1-rc10 appears to be running just fine on sun4x_510
> after crashes with previous versions forced upgrades.

At this point there are a number of serious known problems in 1.4.x 
fileservers older than that.  I would not recommend upgrading to 1.4.0.


>> 2.  We need to shut down an older cell server and bring up a new one
>> in another building.
>>
>> For issue 2, we have set the vlserver prefs on each client so that the
>> clients won't select the cell server we want to move to another
>> building (or it will be last in the pref list).  Can we just shut
>> down the old cell server and bring up another (in another building)
>> without much worry about UBIK issues?  This is somewhat similar to
>> issue 1.
>
> I have done this as well without any issues.
>
> I assume you have two other AFS DB servers to maintain quorum?
>
> Is the IP address of the new server going to match the old server?

Given that it's in another building, I'm going to assume not.
Assuming the steady state is three servers, the safest route is probably to 
add the new server to the CellServDB on all three existing servers, restart 
them to pick up the new confiuration, and then set up the new server.  Once 
the new server is happy, shut down the old server and _then_ remove it from 
the CellServDB on the remaining ones, again restarting them one at a time 
to pick up the change.

You can run into all sorts of problems if your servers do not all have the 
same set of hosts in the (server-side) CellServDB, as the quorum mechanism 
depends on this information being consistent across servers.  However, if 
the only inconsistency is that some of the machines know about one extra 
server _and_ that server is never up while the configurations are 
inconsistent, then the worst that can happen is that you lose quorum a 
little more easily.


>> 3.  We'd like to turn off the old KAS from Transarc and rely totally
>> on Kerb 5 (finally).  We are already using Kerb 5 everywhere and none
>> of our AFS clients use KAS anymore, but we've never actually disabled
>> it.

Well, you could just turn the kaserver off and see what happens.
Or you could look at network traffic first and see whether your kaservers 
are getting any requests.


>> 4.  We'd like to try real K5 AFS service tickets without using the 5
>> to 4 daemon.
>>
>> For issue 4, I am under the impression (from my conversation at the
>> last BPW) that we can disable our 5 to 4 daemon that AKLOG uses and
>> AKLOG will just take the K5 encrypted part and just stuff it into the
>> AFS cred manager.  The only thing we need to do is update our key
>> files on the file servers right?  Can AKLOG do what it needs to do
>> without having access to a 5 to 4 daemon?
>
> If you have an aklog that uses pure krb5, yes, it should just work
> without a krb524d running.

Yup; that's going to depend entirely on what aklog you have.


> AFAIK, you shouldn't need to update your AFS key files, but its possible
> that mine are new enough to not need to refreshed to a new enc_type.

AFS key files can store only single-DES keys.  The thing you have to do to 
make pure V5 tickets work is upgrade the server _software_.  I believe 
1.2.10 is the oldest version that will work, but don't hold me to that (and 
don't run 1.2.10 after January 10, 2004 if you have more than one dbserver).


-- Jeff