[OpenAFS] OpenAFS dup partition to different servers

Todd M. Lewis Todd_Lewis@unc.edu
Thu, 24 Mar 2005 08:35:32 -0500


Andrew Velikoredchanin <andrew@rodtext.ru> wrote:

> This meen - I can not update files on replication valumes?

That's correct.

> I need 
> generaly add new files and remove old files on this valumes - no need 
> change files.

It doesn't matter. The replication is not at the file level, it's at the 
volume level.  When you replicate a volume, a snapshot is taken of the whole 
volume, and that gets sent out to all the servers that host one of the 
read-only replicas. (There was some talk about only sending out deltas if the 
volume had been replicated before, but I don't think that's been implemented.)

Frankly, replication may not seem very important to you if you only have one 
or two servers and not many users.  We have thousands of users and dozens of 
servers.  When we install a software package in our cell, we create a volume 
just for that package, get it all configured and built and installed so it 
works properly in AFS, then we replicate it to at least four servers. When a 
client goes to use that package, it might use any one of those four servers, 
and if one happens to be down (which almost never happens -- are servers are 
typically up 200 to 300 days at a time) the client using one of those servers 
for that package quietly "fails over" to another server, and the user 
typically never knows something went wrong.

The real beauty comes in when we go to update or reconfigure a package, or 
change some of its default settings.  We can play with the read-write copy all 
we want, break it a few times and fix it, all without impacting users who are 
using the read-only replicas because those haven't changed.  Once we get it 
like we want it, we do the "vos release <volume.name> -verbose" thing and 
everybody picks up the changes.

That's what replication gets you. It doesn't do anything about solving the 
"how do I write to files when the server hosting those r-w files goes down" 
problem.

>> It seems like everybody wants multiple read-write volumes when they 
>> first encounter AFS, but it doesn't work that way.  If it did, then 
>> the servers hosting the r-w volumes would have to be in constant 
>> contact with each other, and performance would be terrible. You really 
>> don't want that. You don't even want to want that.
> 
> 
> :(
> May by you know - what distributed filesystems support rw replications?

As far as I know, none. Not any. It's a really hard problem. Think about the 
failure modes and what the servers would have do to -- reliably -- in the 
background to make that work.  It's the "reliably" part that makes it almost 
impossible. The reason you want rw replicas is because you're worried that 
something will fail. But if something's failing, it's likely to keep the 
servers from coordinating updates anyway. Think about it: for multiple r-w to 
work, a whole lot more has to work than simply keeping one rw server working; 
multiple servers have to keep working as well as the interconnects between 
them. And what about when one client is updating a file on one server, and 
another client is updating _the same file_ through another server? The 
coordination problems would be enormous. It just isn't worth the complexity. 
You (as a network/file system administrator) are better off having a robust 
single server that either works (and therefore doesn't lose users' data) or 
fails outright (and doesn't lose users' data).  The added complexity of having 
multiple servers trying to coordinate disparate writes greatly increases the 
risk of users losing data.

>> Replication is a great way to spread out access (and thus load) to 
>> static parts of your tree among multiple servers, but that's all it does.
> 
> 
> OK. I undestend. Replications in OpenAFS usage for load balancing.

Right. But as handy as replication is, there are other things that make AFS 
cool. Users can create their own groups, add whomever they want to them, and 
set directory ACLs to do just the right level of protection. You, as the 
administrator, can get out of the managing-groups-for-users business. (You 
still might manage some groups for administrative purposes, but that's different.)

Also -- this is the coolest thing in AFS from my perspective (as one who has 
to deploy packages for multiple architectures) -- the "@sys" macro allows you 
to have architecture-specific things show up in the directory tree in the same 
place regardless of what type of system you are on. For example, in our cell 
(isis.unc.edu), we have a bunch of packages installed in our 
/afs/isis.unc.edu/pkg directory (or just /afs/isis/pkg for short).  Take as a 
typical package "ne" -- a little text editor that I maintain locally. It's 
available for 9 different architectures in our cell. But no matter which type 
of system you login to, the full path is "/afs/isis.unc.edu/pkg/ne/bin/ne". 
How? Well, first, "/afs/isis/pkg/ne" is a symbolic link to 
"/afs/isis/pkg/ne-136". BTW, older versions -- ne-119 and ne-133 -- are still 
around if anybody's interested. Without the version number on any of our 
packages, you get the default (usually the newest) version. Inside the ne-136 
package, we have this structure:

> $ cd /afs/isis/pkg/ne
> $ ls -al
> drwxrwxrwx  2048 Sep 16  2004 .
> drwxr-xr-x 30720 Mar 22 10:09 ..
> lrwxr-xr-x    17 May  6  1998 bin -> .install/@sys/bin
> lrwxr-xr-x    11 Sep 16  2004 build -> .build/@sys
> drwxr-xr-x  2048 Sep 16  2004 .build
> lrwxr-xr-x    15 May  6  1998 common -> .install/common
> lrwxr-xr-x    11 Sep 16  2004 dist -> .build/dist
> lrwxr-xr-x    19 May  6  1998 doc -> .install/common/doc
> lrwxr-xr-x    17 May  6  1998 etc -> .install/@sys/etc
> lrwxr-xr-x    21 May  6  1998 include -> .install/@sys/include
> drwxr-xr-x  2048 Sep 16  2004 .info
> lrwxr-xr-x    13 May  6  1998 install -> .install/@sys
> drwxr-xr-x  2048 Nov 15 11:31 .install
> lrwxr-xr-x    17 May  6  1998 lib -> .install/@sys/lib
> lrwxr-xr-x    21 May 20  1998 libexec -> .install/@sys/libexec
> lrwxr-xr-x    19 May  6  1998 man -> .install/common/man
> lrwxr-xr-x    18 May  6  1998 sbin -> .install/@sys/sbin
> lrwxr-xr-x    21 May  6  1998 share -> .install/common/share
> lrwxr-xr-x    10 Sep 16  2004 src -> .build/src

See that "bin" entry? It's a symbolic link to ".install/@sys/bin".  The cache 
manager (I think) translates that "@sys" to one of "amd64_linux24", 
"i386_linux24", "ppc_darwin_70", "rs_aix51", "rs_aix52", "sgi_65", "sun4x_57", 
"sun4x_58", or "sun4x_59", depending on the type of architecture I'm on. 
There's a tree for each architecture under the ".install" directory.  Behold:

> $ ls -l /afs/isis/pkg/ne/.install/*/bin/ne
> -rwxr-xr-x 281535 Sep 17  2004 /afs/isis/pkg/ne/.install/amd64_linux24/bin/ne
> -rwxr-xr-x 281535 Sep 17  2004 /afs/isis/pkg/ne/.install/i386_linux24/bin/ne
> -rwxr-xr-x 290340 Sep 17  2004 /afs/isis/pkg/ne/.install/ppc_darwin_70/bin/ne
> -rwxr-xr-x 466526 Sep 28 10:09 /afs/isis/pkg/ne/.install/rs_aix51/bin/ne
> -rwxr-xr-x 725233 Sep 28 10:20 /afs/isis/pkg/ne/.install/rs_aix52/bin/ne
> -rwxr-xr-x 427208 Sep 17  2004 /afs/isis/pkg/ne/.install/sgi_65/bin/ne
> -rwxr-xr-x 345156 Sep 17  2004 /afs/isis/pkg/ne/.install/sun4x_57/bin/ne
> -rwxr-xr-x 347132 Sep 17  2004 /afs/isis/pkg/ne/.install/sun4x_58/bin/ne
> -rwxr-xr-x 350688 Sep 17  2004 /afs/isis/pkg/ne/.install/sun4x_59/bin/ne

So, by clever use of the "@sys" macro in the file system, you can abstract 
away architecture dependencies.  We do something similar for the ".build" tree 
where we build all the different flavors of a package from a single copy of 
the source. (Note also, even though you can't tell from looking, but ".build" 
is a mount point for a package we don't replicate. We only replicate the upper 
level stuff with the binary files and libs, but not the build tree, as nobody 
needs that stuff but me and there's no point in duplicating all those files.)

Those are some of the things that make AFS really cool to work with. 
Replication is good to have for static parts of the file system, but in day to 
day use users aren't even aware of it. ACLs and architecture independence, 
however, are really cool.

Good luck with your AFS adventure. And don't be shy about asking questions on 
the list. That's what it's there for.  Cheers,
-- 
     +--------------------------------------------------------------+
    / Todd_Lewis@unc.edu  919-962-5273  http://www.unc.edu/~utoddl /
   /              The man who fell into an upholstery             /
  /                  machine is fully recovered.                 /
+--------------------------------------------------------------+