[OpenAFS] a noobs question and problems on a new cell

Christopher D. Clausen cclausen@acm.org
Wed, 9 May 2007 08:49:04 -0500

Adnoh <adnoh@users.sourceforge.net> wrote:
> Christopher D. Clausen wrote:
>>> i don't want to have all the volumes in our headquarter. so every
>>> time a user openes his word-doc or similar it would be completly
>>> transfered over our VPN - and I can hear the people crying "our
>>> fileservers are too slow !" so seperate fileservers in every
>>> district would be a good choice, I think - would'nt they ?
>> That is an option.  There are of course problems with doing either.
>> Remember that the AFS clients themselves cache read-only data.  So if
>> most of your data is only being read and not written back that
>> often, it might make sense to have only centrally located AFS
>> servers.
> thats right - but my problem at the moment is that we have only
> windows-workstations. And I did'nt figure out how
> I could customize the MSI-installation in that way, so I don't need to
> travel to all our restricts and configure that client.
> so I would like one afs "client" per district - the fileserver which
> is already there (a linux gentoo machine) - some kind of
> afs->samba-gateway

While some people have done afs->samba gateways, I personally don't 
think that is a good idea.  You have all the problems of samba and AFS 
combined and you miss a lot of the best features of either.

>> By default, the AFS client prefers to use readonly volumes, so if you
>> create a replica of a volume, the data will immediately become
>> readonly. You can however manualy force the mount point to be RW
>> (-rw option to fs mkm) and this way you can have an RW volume in
>> each local district and still be able to clone the data to other
>> servers using vos release.  All volume rights must go to directly to
>> the RW volume.  The AFS client does not detect when you want to make
>> a write and find the proper RW volume. You can modify the code to
>> make it behave that way, but there are
>> reasons for not doing that.
> I tried that this way and didn't get it:
> a volume called software (~1 Gig)
> in our headquarter the rw-volume on the afs server.
> in a district the (nightly) ro-snapshot of that volume.
> mounted into afs like:
> /afs/domain/.software (-rw)
> /afs/domain/software (ro)
> so if I understand that right i should now be able to access the data
> under /afs/domain/.software on both sides.
> in the headquarter it should use always the rw-instance and in the
> district it should use the rw-instance (over vpn) on a write,
> and on a read it should prefer the local ro-instance. but that
> doesn't work for me.
> everytime I accessed some software in the district it was transfered
> completly over the vpn from our headquarter.
> did I something missunderstood or have I done something wrong !?

What commands did you use to set this up?  And physically where are the 
servers that you used to do it?  It should be possible to do something 
that you want, but users will need to understand the difference between 
the paths and open the appropriate folder for either writting or 
reading.  You can't have just writes go to the RW volume.

> the idea of this behaviour (take the lokal ro if available and just
> get what you still need over vpn) was the coolest feature of the afs
> - i thougt. and is the most case why I was looking on the whole afs
> thing - and not something like nfs.

You might need to use fs setserverprefs to have the clients on each side 
use the correct server.  Also, note that the AFS client will NOT switch 
between using the RO and RW automatically (well, if the RO goes down, 
the RW will be used, but that isn't likely what you want to happen in 
this case.)  If you are using the "dot path" all reads and writes will 
be to the RW volume.

Generally, its a "best practice" to have an RO clone on the same server 
as the RW as well.  Not sure if you did that or not.

>> However, you might simply be better off using a more common network
>> filesystem like NFS or samba and using something like rsync to backup
>> the data nightly.  You mentioned a VPN.  Since the network link is
>> already encrypted, you don't require filesystem encryption?  Or do
>> you?
> I'm not shure of the encryption ting. the vpn is a line from a large
> provider in germany. so I think the line is secure, but I'm a little
> bit paranoide ;-)

AFS has built-in encryption.  Its not the best, but its better than 
nothing.  Since you already have a secured VPN, that is not an issue for 
you though.

>> It seems as though you are trying to use AFS like NFS or samba,
>> creating a single
>> large share point and allowing everyone to write in it.  This is not
>> the best way to use AFS, although it mostly works.  Replicating
>> single large volumes can take a long time, especially over slow
>> links.
> yes and no. we have our samba-fileservers in every district completely
> seperated from each other.
> so if user a from district a wants to give a file to user b from
> district b for working on it - he uses email. when
> user b has his work completed on that file he uses that way to get
> the file back to user a - and if someone in district
> a has altered the file in that time - they have a problem...
> so yes, i would like one big namespace - something like
> /afs/domain/data/it
>                       /controlling
>                       /bookkeeping
> and so on - so every user in a organisation unit can access his data
> from each district he is at the moment and easilly share that to
> someone else who is maybe not in the same district.
> i thougt this is something afs wants me to give.

AFS can do what you want, but the performance over the WAN links is 
likely going to be poor.  And since the RW volume can only be a single 
server, someone is going to be stuck with the slow connection.

> Can you describe a "distrcit office" in more detail?  How many users?
> ->This differs - lets say 10 districts, 5 with ~100 users, 60 Gig of
> data and a "data-change" of 100MB / Day
> and the other 5 with the half of the above.

If you data change rate is only 100MBs, that should be okay to just use 
a client from each district.   Yes, opening and closing files will be 
slow, but try to use large client caches to minimize the impact.

> Is there technical staff there to diagnose problems with an AFS
> server, if they occur?  Are the offices always connected to the
> network?  What type of connection do they have?  Bandwidth?  Latency?
> ->no - the only technical staff is in our headquarter. we have a vpn
> from a large provider which has a offline-time of maybe 10 Min / Year
> at all - so it is very goot. The Bandwith differs - from 512k -
> 2Mbit. they are connected 24h / day.

You do not likely want to try and have AFS servers in each remote 
location if you do have the staff there to fix problems.

> Do you use Kerberos 5 currently within your organization?  A single
> realm? Or a realm per district?
> ->We use a windows 2003 ADS for authentications of the windows
> workstations and the samba-servers.

Ah, ok.

Have you looked into using Microsoft's Dfs?  It might provide the 
namespace that you want, but not require you to completely switch your 
infrastructure to use AFS.

> Do you have any off-site backup or disaster recovery requirements?
> ->I would like to have a backup on the local usb-hdd in each district
> and a centraliced backup in our headquarter with a fullbackup/week and
> diff-backup/day.

Okay.  Its pretty easy to clone or copy volumes with AFS.  The exact 
details would depend upon lots of factors and should probably be 
addressed in a seperate thread.

> Any specific features that the project MUST do?  Any features that the
> project SHOULD do?  Anything else that
> would be nice to do?
> ->  yes - that what I have mentioned above ;-) - the "global"
> namespace would be nice. maybe it is
> interesting to tell you that we wanne migrate the workstations to
> linux in the next 2-3 years.

You can do a similar "global" type namespace using Dfs and Windows AD. 
I strongly suggest you look at it first, especially for a mostly Windows 

> How much data are we talking about here?  Total and at each district?
> What is the "change rate" of your data?  How much
> data is modified per day or per week as a percentage of the total
> data? ->mentioned above - all together, maybe ~ 500 Gig at the moment
> - but I don't know how much duplicate data is there arround - you now
> that "i need my files in every district, my local hdd and for best on
> my usb again" ;-)

Yeah, getting accurate disk space counts across lots of different 
machines isn't easy.


If you want some specific help with trying out AFS, it might be worth 
asking the good folks on the #openafs IRC channel on the Freenode 
network.  For instance, the RO vs RW stuff isn't easy to fully grasp at