[OpenAFS] Some advice please

Todd M. Lewis Todd_Lewis@unc.edu
Tue, 14 Nov 2006 09:46:45 -0500


Matt Hampton wrote:
> Hi
> 
> I have probably missed this whilst looking through the FAQ so forgive me
> if I am asking stupid questions.

These are good questions, but I don't think you'll like these answers.

> I am looking to migrate to a DFS to provide increased resilience to our
> business and to allow us to scale as required.

Beware: Distributed != Redundant;  Scalability != Resilience. For the 
problems that AFS addresses well, those of us who use it don't want to 
give it up. But it's not a silver bullet.

> We provide distributed email filtering (i.e. anti-spam etc) and need to
> start provide archiving.
> 
> We also host email accounts (in Maildir format) which we would also like
> to move to a more distributed architecture.
> 
> So my questions are as follows:
> 
> 1. Can I configure AFS to always store a file on at least two servers.

No. And, maybe. Probably not the way you want.

AFS stores volumes on servers, and these volumes contain the files. Some 
volumes are read-only, while others are read-write.  Read-only volumes 
can live on multiple servers, and thus you get the benefits of redundant 
servers for those volumes (and thus, for those files). Read-write 
volumes live on one server at a time. They are easy to move from one 
server to another, but the holy grail of distributed files systems -- 
magically storing live read-write files in multiple places -- is not 
something AFS was designed to achieve. If you want something like raid5 
bumped up to the server level, AFS is not your answer.

> 2. If I store files in /afs/My.Identifier/archive and need to increase
> space is it as simple as introducing a new server.

Again, AFS stores volumes (which contain files). You need to think of 
these problems in terms of manipulating volumes. Each volume has a quota 
independent of other volumes. So you could simply bump up the quota for 
the volume containing /afs/My.Identifier/archive (or in AFS speak, 
"/afs/@cell/archive").

Typically, AFS servers store volumes on one or more dedicated 
partitions. Obviously, the server partition in question must have the 
space available to hold whatever you try to actually put in all the 
volumes on that partition. You _don't_ have to have as much space as the 
sum of all the volumes' quotas -- you can "overbook" quota. If your 
partition is too small, you can set up another partition on the same or 
another server and move volumes to the new partition(s) as you see fit. 
User's typically need not be bothered by such behind-the-scenes 
manipulations.

> 3. I know there are issues with any DFS and simultaneous access to files
> (e.g. when using shared IMAP folders based on mailbox) but by using
> Maildir have I reduced this issue?

No.

> 4. I am intending to have a number of IMAP/POP3 servers which would be
> round robin load balanced with about an hour TTL would this also help
> with the issue in 3?

No.  Typically, cooperating processes on one client can use advisory 
locks and do whatever they want with expected results (they all share 
the same cache manager, and thus the same view of the file system 
state), but processes hitting the same read-write files and directories 
from different clients generally need to be written to be aware of the 
effects they can have on each other or you will lose joy.

> Any other suggestions would be welcomed!
> 
> Finally
> 
> We have are servers on rented dedicated servers which we do not have
> physical or console access to.  They are all based on CentOS and do not
> have partitions available - could someone point me in the direction on
> how to move these servers to be AFS clients only using SSH access?

Installing (and even building if necessary) the OpenAFS client is pretty 
straightforward. You will have to update it (or recompile) after each 
kernel upgrade, which is only mildly painful, but it also keeps you 
fresh on the technique. :^/  There's not much more to setting up an AFS 
_client_ than that.

Your bigger problem is going to be setting up and learning to manage 
OpenAFS _servers_ -- the boxen that store the volumes, the vldb (volume 
location data base, the pts (protection server), kerberos -- and 
figuring out how to think about your storage issues in OpenAFS terms. 
AFS is different enough (it doesn't behave like, say, ext3) that getting 
a feel for what it can do for you vs. what it doesn't affect vs. where 
you're trying to fight against it for your particular needs is going to 
take some time and testing on your part...

...and of course, being willing to ask what may feel like stupid 
questions in public occasionally.

> Thanks for any help,
> 
> regards
> 
> Matt

Cheers,
-- 
    +-----------------------------------------------------------------+
   /   Todd_Lewis@unc.edu  919-445-9302  http://www.unc.edu/~utoddl  /
  / A Freudian slip is when you say one thing but mean your mother. /
+-----------------------------------------------------------------+