[OpenAFS] Industrial Strength AFS Usage Details

Phil.Moore@morganstanley.com Phil.Moore@morganstanley.com
Mon, 23 Apr 2001 15:03:09 -0400 (EDT)

>>>>> "Jeffrey" == Jeffrey E Sussna <jsussna@mongonet.net> writes:

Jeffrey> Here is a more detailed description of my potential AFS usage model. I 
Jeffrey> would appreciate any concrete feedback on AFS's suitability for it:

Jeffrey> Basic model: process A writes file X. Process B and C read file X. Process 
Jeffrey> C writes file Y. Processes D, E, and F read file Y. Process F writes file 
Jeffrey> Z. A->B->C->D->E->F constitutes a "job". At the end of the job X, Y, and Z 
Jeffrey> can all be deleted, and should be as each machine will otherwise pile up a 
Jeffrey> few gigs of data per day. Things get interesting if a job crashes. I'd 
Jeffrey> rather not have to grunge around hundreds of machines looking for leftover 
Jeffrey> files that are safe to delete, or maintain global state to do it for me. 
Jeffrey> The entire system runs inside a private network. Many jobs run in parallel, 
Jeffrey> but there is no interaction between jobs. I am also investigating 
Jeffrey> publish-subscribe middleware such as Tibco and Talarian.

I'll give you my take on this...

First of all, let me arrogantly claim to have built one of (if not
the) worlds most mission critical industrial strength AFS
infrastructures in existence.  Transarc certainly used to moan about
the fact that we have pushed AFS into unseen territory here.

Morgan Stanley's entire distributed systems infrastructure is based on
AFS.  Our UNIX desktops and servers are almost all dataless AFS
clients, with both the operating systems as well as production
applications being run from binaries, scripts and config files all
being loaded from RO AFS volumes.

For RO data, you simply can't beat AFS for scalability and
reliability.  We can debate that if you like...

However, when it comes to RW data, this is where the product is
somewhat weak.  As Russ Allbery pointed out, the AFS client is MP-safe
(meaning it works), but is far from MP-fast.  It is essentially single
threaded (we can pick nits here, but note the word "essentially").  On
some platforms (most notably IRIX, in my experience), performance
isn't just flat as you increase the number of processors, but it
actually degrades severaly due to lock contention.

If each individual client in your architecture will not be performing
lots of parallel reads/writes of large volumes of data into AFS, such
that the load on a single clients' AFS cache manager is not too high,
you'll be OK.  

If you have 1000's of clients, however, the fileserver itself deals
with such volumes of RW data fairly well, since the fileserver is
multi-threaded (as of AFS 3.5 I think).

AFS use of RW data must be designed cautiously.  

Having said all of that, I think that the filesystem should be used
for files.  Distributed filesystems are not very good IPC mechanisms,
and you should probably step back and ask yourself if this is right
architecture to begin with.  Sure, you can probably make it work, but
my intuition (and 7+ years of experience with AFS, and 15+ years of
experience with *ultra* large scale systems) says otherwise.

All of the above statements are obviously somewhat general, and we are
really speaking rather qualitatively.  To make quantitative arguments
requires a much closer look at the architecture you're trying to