[OpenAFS] OpenAFS with RAID

Paul Robins paul@paulrobins.alldaypa.com
Wed, 28 Dec 2005 15:21:31 +0000


Jeffrey,
   I appreciate your lengthy reply, you've confirmed many of the things 
I was wondering about. The big issue when it comes to the server 
situation  is that a disk dying will infact kill the entire server as 
these are low budget whiteboxes with basic SATA controllers, nothing 
particularly impressive.
   From John Hascall's post i am extremely interested in using DRBD to 
effectively distribute any filesystem updates, this seems a more 
appropriate solution for my needs, because unfortunately I don't have 
access to 'proper' servers, and the Linux support for the SATA 
controller on these motherboards (yes, i know, embedded controllers are 
satan) is extremely poor.

Many thanks for taking the time to help me. I believe I may even attempt 
to combine DRBD with AFS because we will shortly be opening a second 
staffed site, meaning i will require some form of 'Global Filesystem' if 
you will (no implication of GFS).

Thanks again,
Paul


Jeffrey Altman wrote:
> Paul Robins wrote:
> 
> 
>>Well that's what i was originally wondering, can AFS provide the ability
>>to replicate the contents of one fileserver to others which can be used
>>redundantly. It appears not at all; I'd still like to use AFS but I do
>>think i'm going to have to go NFS and then some sort of faux raid 1 for
>>redundancy.
> 
> 
> Paul:
> 
> The real question you have to answer is what risks are you concerned
> about?   What is the likelihood that you are going to lose an entire
> server without warning in such a manner that it makes a difference to
> the clients that would be communicating with it?
> 
> The reason I specify "without warning" is that AFS far surpasses the
> capabilities of other file systems in the area of volume management.
> You said earlier in the thread that your biggest fear was losing a
> disk.   So we can make that your warning sign.  For each file server
> you deploy use mirrored disks (RAID-1) on which each disk is on its
> own interface card.   Then deploy your file servers and leave enough
> empty space on each of the servers such that if necessary you can
> move all of the volumes on any one server to any of the other servers.
> 
> Now if a disk ever fails the operation of the file server will be
> uninterrupted.   You can then initiate volume moves of the
> non-replicated read-write volumes to other servers.  These moves can
> be performed while the clients are actively using them.  The clients
> will continue using the source server until the move is almost complete,
> there will be a brief busy state where the client waits, and then a
> moved notification which the client responds to by looking up the new
> location and continuing where it left off on the new server.
> 
> Once all of the volumes have been moved off the server, you can take
> the server down and replace the disk or perform whatever form of
> maintenance that is required.
> 
> In the recent past I have seen more outages caused for end users by
> a need to reconfigure non-Andrew file systems either for volume
> redistribution or physical maintenance than I have for physical failures
>  in AFS deployments.   AFS volume management allows you to perform more
> frequent maintenance of the hardware and the OS without impacting
> end users then other models.
> 
> While a network based RAID-5 is a fine idea, the performance is really
> going to be quite poor from the perspective of end users even when the
> machines are physically quite close.   Network RAIDs have the potential
> to provide redundancy when whole portions of the network infrastructure
> are lost.  However, they do so at a significant cost in performance.
> 
> Jeffrey Altman
>