[OpenAFS-devel] Mixed server versions within a cell (longish...)

Kevin openafsd@gnosys.biz
Thu, 10 Mar 2005 20:55:41 -0500


Hi List-

I just finished adding a new server computer to my cell which had
previously been running only a single server performing all server
functions.  According to the guidance in the documentation about cells
with fewer than 4 server computers, I set up the new server in the role
of database server.

The cell uses MIT Kerberos 5 as an authentication source using Ken
Hornstein's migration kit.  For Kerberos, the original server runs
v1.3.6 and the new database server runs v1.4.  For OAFS, the original
server runs v1.2.11 on kernel 2.4.27 and the new server runs 1.3.79 on
kernel 2.6.10.  I was a little concerned about mixing OAFS server
versions within a cell and posted a question about the advisability of
doing so to this list and got only a single reply which indicated that
it wasn't a major problem to do so.

I've completed the add server process and created a new volume on the
new server and cloned and replicated that volume to the original server
and I don't seem to have encountered any major problems thus far, but
I've been rereading the conceptual sections of the docs and encountered
this passage:

Excerpted from the Administration Guide, The Four Roles..., Binary
Distribution Machines
========================
...For consistent system performance, however, all server machines
_must_ run the same version (build level) of a process. For instructions
for checking a binary's build level, see Displaying A Binary File's
Build Level...
========================

The emphasis above is mine because now that I've gone and set up the new
server and I do have mixed server versions within the cell, this
statement has me a little concerned about data integrity.

Additionally, following the link to "Displaying A Binary File's Build
Level", I read the following:
========================
Displaying A Binary File's Build Level
For the most consistent performance on a server machine, and cell-wide,
it is best for all server processes to come from the same AFS
distribution. Every AFS binary includes an ASCII string that specifies
its version, or build level. To display it, use the strings and grep
commands, which are included in most UNIX distributions.    

To display an AFS binary's build level
     1. Change to the directory that houses the binary file . If you are
        not sure where the binary resides, issue the which command. 
           % which binary_file
           /bin_dir_path/binary_file
           % cd bin_dir_path
        
     2. Issue the strings command to extract all ASCII strings from the
        binary file. Pipe the output to the grep command to locate the
        relevant line. 
           % strings ./binary_file | grep Base
        
        The output reports the AFS build level in a format like the
        following: 
        
           @(#)Base configuration afsversion  build_level
        
        For example, the following string indicates the binary is from
        AFS 3.6 build 3.0: 
        
           @(#)Base configuration afs3.6 3.0
========================

I don't get these results with any of the binaries in either OAFS server
version.

Would someone knowledgable about the code implicated when it comes to
mixing server versions within a cell comment on these issues (I realize
that must be alot of code, but that's why I'm hoping to get someone who
understands much or most or all of it to comment here)?

To distill my questions/concerns:

1) Should I be concerned about data loss or data integrity problems when
using mixed server versions within a cell or is this just a consistency
problem?  If the latter, exactly what sort of inconsistencies am I
likely to be facing?

2) Since I'm not getting build-level information from these binaries by
following the guidance in the docs, is there another way to do so?  With
a freshly installed version of OAFS, I know that these binaries come
from 1.3.79 and I remember that my original server used 1.2.11, but I
can see myself forgetting in a bigger installation and it would be nice
to be able to examine any particular binary and figure out which
build-level it came from.

3) I have noticed one minor annoyance: when copying the i386_linux26
binaries from the local disk of the new server machine to the
i386_linux26 volume (hosted by the new server, replicated to the
original server, and mounted in the /afs tree), the process took a great
deal of time (> 10 minutes for the 24 binaries in /usr/afs/bin).
Interestingly, releasing the volume (thus if I understood my reading
correctly, making a copy of all the data in this volume and sending it
over the network from the new server to the old server) took only 15
seconds or so.  That the replication would take 15 seconds or so isn't
terribly surprising to me because it's not alot of data, but that the
copy from the local disk to the /afs tree into this new volume (which is
hosted by the new server) takes so long is very surprising to me.  Is
this the kind of consistency issues that the documentation is referring
to here?  If not, does anyone have any thoughts on what's causing the
big time delay?

Thanks for any thoughts on these questions, and if I haven't said so
enough already, thank you to all those who have made and continue to
maintain OAFS.  I've been using it for 1-2 years now and I'm just
super-impressed with it in every respect.  I'm still learning it and
admiring it more and more as I do so.


-- 
-Kevin
http://www.gnosys.us