[OpenAFS] OpenAFS dup partition to different servers
Todd M. Lewis
Todd_Lewis@unc.edu
Thu, 24 Mar 2005 08:35:32 -0500
Andrew Velikoredchanin <andrew@rodtext.ru> wrote:
> This meen - I can not update files on replication valumes?
That's correct.
> I need
> generaly add new files and remove old files on this valumes - no need
> change files.
It doesn't matter. The replication is not at the file level, it's at the
volume level. When you replicate a volume, a snapshot is taken of the whole
volume, and that gets sent out to all the servers that host one of the
read-only replicas. (There was some talk about only sending out deltas if the
volume had been replicated before, but I don't think that's been implemented.)
Frankly, replication may not seem very important to you if you only have one
or two servers and not many users. We have thousands of users and dozens of
servers. When we install a software package in our cell, we create a volume
just for that package, get it all configured and built and installed so it
works properly in AFS, then we replicate it to at least four servers. When a
client goes to use that package, it might use any one of those four servers,
and if one happens to be down (which almost never happens -- are servers are
typically up 200 to 300 days at a time) the client using one of those servers
for that package quietly "fails over" to another server, and the user
typically never knows something went wrong.
The real beauty comes in when we go to update or reconfigure a package, or
change some of its default settings. We can play with the read-write copy all
we want, break it a few times and fix it, all without impacting users who are
using the read-only replicas because those haven't changed. Once we get it
like we want it, we do the "vos release <volume.name> -verbose" thing and
everybody picks up the changes.
That's what replication gets you. It doesn't do anything about solving the
"how do I write to files when the server hosting those r-w files goes down"
problem.
>> It seems like everybody wants multiple read-write volumes when they
>> first encounter AFS, but it doesn't work that way. If it did, then
>> the servers hosting the r-w volumes would have to be in constant
>> contact with each other, and performance would be terrible. You really
>> don't want that. You don't even want to want that.
>
>
> :(
> May by you know - what distributed filesystems support rw replications?
As far as I know, none. Not any. It's a really hard problem. Think about the
failure modes and what the servers would have do to -- reliably -- in the
background to make that work. It's the "reliably" part that makes it almost
impossible. The reason you want rw replicas is because you're worried that
something will fail. But if something's failing, it's likely to keep the
servers from coordinating updates anyway. Think about it: for multiple r-w to
work, a whole lot more has to work than simply keeping one rw server working;
multiple servers have to keep working as well as the interconnects between
them. And what about when one client is updating a file on one server, and
another client is updating _the same file_ through another server? The
coordination problems would be enormous. It just isn't worth the complexity.
You (as a network/file system administrator) are better off having a robust
single server that either works (and therefore doesn't lose users' data) or
fails outright (and doesn't lose users' data). The added complexity of having
multiple servers trying to coordinate disparate writes greatly increases the
risk of users losing data.
>> Replication is a great way to spread out access (and thus load) to
>> static parts of your tree among multiple servers, but that's all it does.
>
>
> OK. I undestend. Replications in OpenAFS usage for load balancing.
Right. But as handy as replication is, there are other things that make AFS
cool. Users can create their own groups, add whomever they want to them, and
set directory ACLs to do just the right level of protection. You, as the
administrator, can get out of the managing-groups-for-users business. (You
still might manage some groups for administrative purposes, but that's different.)
Also -- this is the coolest thing in AFS from my perspective (as one who has
to deploy packages for multiple architectures) -- the "@sys" macro allows you
to have architecture-specific things show up in the directory tree in the same
place regardless of what type of system you are on. For example, in our cell
(isis.unc.edu), we have a bunch of packages installed in our
/afs/isis.unc.edu/pkg directory (or just /afs/isis/pkg for short). Take as a
typical package "ne" -- a little text editor that I maintain locally. It's
available for 9 different architectures in our cell. But no matter which type
of system you login to, the full path is "/afs/isis.unc.edu/pkg/ne/bin/ne".
How? Well, first, "/afs/isis/pkg/ne" is a symbolic link to
"/afs/isis/pkg/ne-136". BTW, older versions -- ne-119 and ne-133 -- are still
around if anybody's interested. Without the version number on any of our
packages, you get the default (usually the newest) version. Inside the ne-136
package, we have this structure:
> $ cd /afs/isis/pkg/ne
> $ ls -al
> drwxrwxrwx 2048 Sep 16 2004 .
> drwxr-xr-x 30720 Mar 22 10:09 ..
> lrwxr-xr-x 17 May 6 1998 bin -> .install/@sys/bin
> lrwxr-xr-x 11 Sep 16 2004 build -> .build/@sys
> drwxr-xr-x 2048 Sep 16 2004 .build
> lrwxr-xr-x 15 May 6 1998 common -> .install/common
> lrwxr-xr-x 11 Sep 16 2004 dist -> .build/dist
> lrwxr-xr-x 19 May 6 1998 doc -> .install/common/doc
> lrwxr-xr-x 17 May 6 1998 etc -> .install/@sys/etc
> lrwxr-xr-x 21 May 6 1998 include -> .install/@sys/include
> drwxr-xr-x 2048 Sep 16 2004 .info
> lrwxr-xr-x 13 May 6 1998 install -> .install/@sys
> drwxr-xr-x 2048 Nov 15 11:31 .install
> lrwxr-xr-x 17 May 6 1998 lib -> .install/@sys/lib
> lrwxr-xr-x 21 May 20 1998 libexec -> .install/@sys/libexec
> lrwxr-xr-x 19 May 6 1998 man -> .install/common/man
> lrwxr-xr-x 18 May 6 1998 sbin -> .install/@sys/sbin
> lrwxr-xr-x 21 May 6 1998 share -> .install/common/share
> lrwxr-xr-x 10 Sep 16 2004 src -> .build/src
See that "bin" entry? It's a symbolic link to ".install/@sys/bin". The cache
manager (I think) translates that "@sys" to one of "amd64_linux24",
"i386_linux24", "ppc_darwin_70", "rs_aix51", "rs_aix52", "sgi_65", "sun4x_57",
"sun4x_58", or "sun4x_59", depending on the type of architecture I'm on.
There's a tree for each architecture under the ".install" directory. Behold:
> $ ls -l /afs/isis/pkg/ne/.install/*/bin/ne
> -rwxr-xr-x 281535 Sep 17 2004 /afs/isis/pkg/ne/.install/amd64_linux24/bin/ne
> -rwxr-xr-x 281535 Sep 17 2004 /afs/isis/pkg/ne/.install/i386_linux24/bin/ne
> -rwxr-xr-x 290340 Sep 17 2004 /afs/isis/pkg/ne/.install/ppc_darwin_70/bin/ne
> -rwxr-xr-x 466526 Sep 28 10:09 /afs/isis/pkg/ne/.install/rs_aix51/bin/ne
> -rwxr-xr-x 725233 Sep 28 10:20 /afs/isis/pkg/ne/.install/rs_aix52/bin/ne
> -rwxr-xr-x 427208 Sep 17 2004 /afs/isis/pkg/ne/.install/sgi_65/bin/ne
> -rwxr-xr-x 345156 Sep 17 2004 /afs/isis/pkg/ne/.install/sun4x_57/bin/ne
> -rwxr-xr-x 347132 Sep 17 2004 /afs/isis/pkg/ne/.install/sun4x_58/bin/ne
> -rwxr-xr-x 350688 Sep 17 2004 /afs/isis/pkg/ne/.install/sun4x_59/bin/ne
So, by clever use of the "@sys" macro in the file system, you can abstract
away architecture dependencies. We do something similar for the ".build" tree
where we build all the different flavors of a package from a single copy of
the source. (Note also, even though you can't tell from looking, but ".build"
is a mount point for a package we don't replicate. We only replicate the upper
level stuff with the binary files and libs, but not the build tree, as nobody
needs that stuff but me and there's no point in duplicating all those files.)
Those are some of the things that make AFS really cool to work with.
Replication is good to have for static parts of the file system, but in day to
day use users aren't even aware of it. ACLs and architecture independence,
however, are really cool.
Good luck with your AFS adventure. And don't be shy about asking questions on
the list. That's what it's there for. Cheers,
--
+--------------------------------------------------------------+
/ Todd_Lewis@unc.edu 919-962-5273 http://www.unc.edu/~utoddl /
/ The man who fell into an upholstery /
/ machine is fully recovered. /
+--------------------------------------------------------------+