[OpenAFS] Changes for Mosaic's AFS cell...

Jeffrey Hutzelman jhutz@cmu.edu
Thu, 06 Apr 2006 02:35:32 -0400


On Thursday, April 06, 2006 02:19:11 AM -0400 Marcus Watts <mdw@umich.edu> 
wrote:

> For the file servers, it's usually possible to upgrade fileserver
> software without moving volumes.  I haven't found many production
> sites willing to risk this with real data.  There have been and
> probably will be upgrades where this is not true.  For instance
> the current namei fileserver has a 3 bit integer field which can't
> be easily enlarged without breaking the on-disk format.

True, but I really hope when the change comes it will be in the form of a 
completely different, less-bogus format for namei volumes, along with some 
mechanism to allow both "old" and "new" volumes to coexist on the same 
server.  I know various people have thought about these issues, but I don't 
know what such a mechanism would look like or when it would come along. 
Again, I would be rather disappointed if it appeared in the middle of the 
1.4.x branch.


> * The first rule is that for a given server (ptserver, vlserver,
>   buserver), all the servers that are up should be running exactly the
>   same version.  There have been various changes going from transarc to
>   modern openafs in the ubik inter-server protocol; often these don't
>   happen between versions, but unless you know for certain that your old
>   & new version can interoperate, you should not assume that they will.
>   { many versions of openafs may be compatible.  Still best to ask first.
> }

For this purpose, I believe I can safely assert that to date, there have 
been neither database format nor wire protocol changes in OpenAFS which 
would make this a problem.  There definitely have been these kinds of 
changes in AFS releases done before that, and I expect there will be 
similar changes in the future.  For now, a 1.2.<recent> to 1.4.x transition 
should be quite safe,.



> Client side.
>
> 	By doing the server side in two steps as above,
> 	you have a window in the middle where you can find all
> 	clients, update cellservdb, and either reboot or do
> 	"fs newcell".  No rush.  Take a week.  Take a month.
> 	When do you have to be out of your old space?
>
> 	Note that if the old or new server is the "sync" site,
> 	the clients that don't have the sync site won't be able
> 	to make changes.  This can be finessed too.  Easiest
> 	way would be to start by turning the old server off
> 	long enough for a new sync site to be elected.

Another way is to mark the server that is not known to all clients as 
nonvoting, so it won't ever become sync site.  Note that in the transition 
sequences we both gave, it is assumed that all servers are voting. 
Transitioning a nonvoting server to voting or vice versa should be handled 
in the same way as adding or removing it, respectively -- nonvoting servers 
are treated differently by the quorum code, so having a server listed as 
voting on some machines and nonvoting on others is exactly as dangerous as 
having it listed at all on some machines and not others.





> Any existing distribution of openafs can only use plain des-cbc-crc keys,
> just like always.  { there is reason to hope though - I've seen
> a version of ptserver run with aes.  :-) }

Actually, it's been able to deal with des-cbc-md4 and des-cbc-md5 for some 
time now.  Of course, fcrypt is still fcrypt...



> I think what I said above is still useful so I'm going to to post
> it anwyays.

Good.


> I have a patch for really ancient versions of transarc
> afs if you want to run them after Jan 10 2004.  For a bit we ended
> up running hand-patched binaries.  As I recall, the aix compiler had
> scrambled the immediate constant to two widely separated instructions
> so it was a bit interesting to figure out what to patch.

You, sir, are insane. :-)

-- Jeff