[OpenAFS] sanity check please.

Mike Polek mike@pictage.com
Thu, 08 Sep 2005 17:01:14 -0700

> From: Pucky Loucks <ploucks@h2st.com>
> Date: Sat, 3 Sep 2005 16:50:33 -0700
> Subject: [OpenAFS] sanity check please.
> Hi everyone, I've been playing with openafs for a couple of months  
> now and I'm just wanting some to sanity checks of my idea of an AFS  
> deployment.
> I'm wanting to build a scalable system that will have millions of  
> images stored on a file system.  (this is where AFS comes in)  It  
> looks to me like AFS is able to deal with scaling the partitions and  
> volumes i.e. total storage.  In the end I could have terabytes of  
> data. Still my belief is that AFS can handle it.  My concern is that  
> I want to replicate the data so that I have some redundancy, AFS can  
> handle this too.

My company does something along these lines. We process thousands
of images each day and store them in AFS. We have nearly
100TB of storage at the moment.

> 1) Is this going to become a huge management issue?

Over time, we've written a number of custom scripts to
handle the volume management. As with anything else,
any management tasks that you find yourself doing
repetitively, write a script for it. The key is to
define your workflow and develop some software to
assist you in moving things around. Be sure to have
a definition for when things expire, if you want to
keep your storage finite.

> 2) If I end up getting 5 thousand images a day would I want it to be  
> in it's own volume so I could replicate each "day"?

We found that a key is to set things up so that volumes are finite,
and eventually stabilize. We organize our images according
to the event they came from. Each event produces a finite
number of images, so eventually the volume stops growing.
It's much easier to backup/move/replicate volumes that
are stable. We have an index volume with a finite depth
tree structure in which we mount all the volumes with
images. It's also important to decide up front how you
will name your volumes. Name them for the logical grouping
of the images they contain (whether it's the event, the
source, the date range of the received date of the images,
or whatever). That way if anything happens to the mount points,
you can reconstruct your index from the volume names in a pinch.

> 3) what's the recommend max size for a volume?

Depends on what you want to do with it. The bigger the volume,
the more time it takes to backup/move/replicate. Do some tests
with your systems, find out how long it takes to move/replicate
volumes of different sizes, and decide how long you're willing
for the operation to take. Bound your volumes based on the time
those operations would take, and the probability that the
operation will be interrupted and have to be repeated.

  > 4) These files will be served via apache is that an issue? (my
> understanding is it's not)

Shouldn't be.

Good luck!

Mike Polek,
Pictage, Inc.