[OpenAFS] Failover

Troy Benjegerdes hozer@hozed.org
Sat, 31 Dec 2005 20:02:04 -0600

On Sat, Dec 31, 2005 at 08:03:40PM -0500, Jeffrey Hutzelman wrote:
> On Saturday, December 31, 2005 12:36:40 AM -0600 Troy Benjegerdes 
> <hozer@hozed.org> wrote:
> >The advantage of AFS over a single system is you can have as many
> >incoming MTA machines, and imap servers as you want.
> Yes, you can.  But as the volume gets large, especially for any given 
> mailbox, the performance goes to hell.  The problem is that whenever you 
> file a message into a mailbox, you change the directory containing the 
> mailbox.  That means that if any other AFS client is also accessing that 
> directory, it has a callback that has to be broken (while YOU wait), and 
> then it has to fetch the entire directory again in order to be able to do 
> the next file lookup.

Sure, a bunch of clients talking to the same directory has scalability
problems, but if I've got a mailbox that is that is huge enough to have 
these problems, it's not something I'm going to be able to effectively read
anyway. Heck, my imap client (backened by afs) only checks mail every 5
minutes anyway.

I suppose this could be a problem with a shared mailbox with hundreds of
deliveries per second, but there's no human that could keep up with that
rate anyway. Anything over 1 delivery per second, and the human factors
are the bottleneck, not the system scalability.

> Once upon a time, more or less all of Carnegie Mellon's messaging needs 
> (mail, netnews, bboards) were handled by the Andrew Messaging System, a 
> distributed system based on AFS.  AMS was an integrated part of the Andrew 
> project, and unlike any mail system in wide use today, was designed from 
> the ground up to take advantage of a distributed computing environment and 
> particularly a distributed filesystem.  Most major components of the system 
> stored data in and communicated via the filesystem.  Incoming MX's, 
> outgoing mail gateways, delivery, bboard filing, etc. could all run on 
> multiple machines, and it was possible to add or remove machines in any of 
> those pools at will.
> Several years ago, Carnegie Mellon abandoned that system, choosing instead 
> to expend huge amounts of developer time on developing, maintaining, and 
> supporting an enterprise-grade distributed IMAP server package.  The Cyrus 
> IMAP system has consumed more than an entire full-time employee for many 
> years now, and there is no sign that will change anytime soon.
> One significant factor in the decision to go down that path was the fact 
> that AMS had serious scalability problems, largely because of the issue I 
> described above.  You could add more mail delivery systems, but that meant 
> more callback breaks and more fetches of large directories from the 
> fileserver.  Sure, it was necessary to develop software because there was 
> no off-the-shelf solution with the required robustness and stability.  And 
> participation in standards efforts (and implementation of those standards) 
> was needed in order to insure it would at least be possible to use 
> off-the-shelf _clients_.  But without the serious performance problems AMS 
> was having, there would have been no need to consider changes to messaging 
> infrastructure at all.

I suspect that this decision may have had more to do with the fact there
were several freely-available and widely distributed IMAP clients than
problems with a distributed filesystem. When that decision was made, was
AFS still a closed-source single-vendor solution?

In reality, I also don't think AFS really became robust enough to support a
use-case like this until it had been open-sourced for a few years, and
people tried doing all kinds of crazy stuff like this and fixing bugs.

> I very much recommend against trying to store mail in AFS.  There is no 
> gain to be had in reliability, scalability, or performance, and there are 
> any number of potential problems.  If what you're trying to accomplish is 
> to get those features in a distributed mail server system, I suggest 
> looking at http://asg.web.cmu.edu/cyrus/

I've looked at cyrus, used it in the past, and moved away from it.  It's
great if you're an enterprise, but I really like having my mail in my
filesystem, and being able to use either a standard imap client,
webmail, or filesystem tools like grep, and the mutt email client. Cyrus
also almost requires a dedicated admin. With afs as the backend, I have one
backup system to maintain, instead of worrying how to back up cyrus as
well, and then learning how to use whatever cyrus has for migrating users
from once piece of hardware to another.