[OpenAFS] Failover

Jeffrey Hutzelman jhutz@cmu.edu
Sat, 31 Dec 2005 20:03:40 -0500


On Saturday, December 31, 2005 12:36:40 AM -0600 Troy Benjegerdes 
<hozer@hozed.org> wrote:

> The advantage of AFS over a single system is you can have as many
> incoming MTA machines, and imap servers as you want.

Yes, you can.  But as the volume gets large, especially for any given 
mailbox, the performance goes to hell.  The problem is that whenever you 
file a message into a mailbox, you change the directory containing the 
mailbox.  That means that if any other AFS client is also accessing that 
directory, it has a callback that has to be broken (while YOU wait), and 
then it has to fetch the entire directory again in order to be able to do 
the next file lookup.


Once upon a time, more or less all of Carnegie Mellon's messaging needs 
(mail, netnews, bboards) were handled by the Andrew Messaging System, a 
distributed system based on AFS.  AMS was an integrated part of the Andrew 
project, and unlike any mail system in wide use today, was designed from 
the ground up to take advantage of a distributed computing environment and 
particularly a distributed filesystem.  Most major components of the system 
stored data in and communicated via the filesystem.  Incoming MX's, 
outgoing mail gateways, delivery, bboard filing, etc. could all run on 
multiple machines, and it was possible to add or remove machines in any of 
those pools at will.

Several years ago, Carnegie Mellon abandoned that system, choosing instead 
to expend huge amounts of developer time on developing, maintaining, and 
supporting an enterprise-grade distributed IMAP server package.  The Cyrus 
IMAP system has consumed more than an entire full-time employee for many 
years now, and there is no sign that will change anytime soon.

One significant factor in the decision to go down that path was the fact 
that AMS had serious scalability problems, largely because of the issue I 
described above.  You could add more mail delivery systems, but that meant 
more callback breaks and more fetches of large directories from the 
fileserver.  Sure, it was necessary to develop software because there was 
no off-the-shelf solution with the required robustness and stability.  And 
participation in standards efforts (and implementation of those standards) 
was needed in order to insure it would at least be possible to use 
off-the-shelf _clients_.  But without the serious performance problems AMS 
was having, there would have been no need to consider changes to messaging 
infrastructure at all.



I very much recommend against trying to store mail in AFS.  There is no 
gain to be had in reliability, scalability, or performance, and there are 
any number of potential problems.  If what you're trying to accomplish is 
to get those features in a distributed mail server system, I suggest 
looking at http://asg.web.cmu.edu/cyrus/


-- Jeff