[OpenAFS] OpenAFS with RAID

Derek Atkins warlord@MIT.EDU
Wed, 28 Dec 2005 10:28:44 -0500


Even a low budget whitebox with s/w raid will survive a disk outage.
As a test I was running a system with raid-1 and pulled the power out
of one of the disks on the running system.  The system didn't crash.
It didn't even slow down.  It DID report the error, but it kept running
happy as a clam.

-derek

Quoting Paul Robins <paul@paulrobins.alldaypa.com>:

> Jeffrey,
>   I appreciate your lengthy reply, you've confirmed many of the 
> things I was wondering about. The big issue when it comes to the 
> server situation  is that a disk dying will infact kill the entire 
> server as these are low budget whiteboxes with basic SATA 
> controllers, nothing particularly impressive.
>   From John Hascall's post i am extremely interested in using DRBD to 
> effectively distribute any filesystem updates, this seems a more 
> appropriate solution for my needs, because unfortunately I don't have 
> access to 'proper' servers, and the Linux support for the SATA 
> controller on these motherboards (yes, i know, embedded controllers 
> are satan) is extremely poor.
>
> Many thanks for taking the time to help me. I believe I may even 
> attempt to combine DRBD with AFS because we will shortly be opening a 
> second staffed site, meaning i will require some form of 'Global 
> Filesystem' if you will (no implication of GFS).
>
> Thanks again,
> Paul
>
>
> Jeffrey Altman wrote:
>> Paul Robins wrote:
>>
>>
>>> Well that's what i was originally wondering, can AFS provide the ability
>>> to replicate the contents of one fileserver to others which can be used
>>> redundantly. It appears not at all; I'd still like to use AFS but I do
>>> think i'm going to have to go NFS and then some sort of faux raid 1 for
>>> redundancy.
>>
>>
>> Paul:
>>
>> The real question you have to answer is what risks are you concerned
>> about?   What is the likelihood that you are going to lose an entire
>> server without warning in such a manner that it makes a difference to
>> the clients that would be communicating with it?
>>
>> The reason I specify "without warning" is that AFS far surpasses the
>> capabilities of other file systems in the area of volume management.
>> You said earlier in the thread that your biggest fear was losing a
>> disk.   So we can make that your warning sign.  For each file server
>> you deploy use mirrored disks (RAID-1) on which each disk is on its
>> own interface card.   Then deploy your file servers and leave enough
>> empty space on each of the servers such that if necessary you can
>> move all of the volumes on any one server to any of the other servers.
>>
>> Now if a disk ever fails the operation of the file server will be
>> uninterrupted.   You can then initiate volume moves of the
>> non-replicated read-write volumes to other servers.  These moves can
>> be performed while the clients are actively using them.  The clients
>> will continue using the source server until the move is almost complete,
>> there will be a brief busy state where the client waits, and then a
>> moved notification which the client responds to by looking up the new
>> location and continuing where it left off on the new server.
>>
>> Once all of the volumes have been moved off the server, you can take
>> the server down and replace the disk or perform whatever form of
>> maintenance that is required.
>>
>> In the recent past I have seen more outages caused for end users by
>> a need to reconfigure non-Andrew file systems either for volume
>> redistribution or physical maintenance than I have for physical failures
>>  in AFS deployments.   AFS volume management allows you to perform more
>> frequent maintenance of the hardware and the OS without impacting
>> end users then other models.
>>
>> While a network based RAID-5 is a fine idea, the performance is really
>> going to be quite poor from the perspective of end users even when the
>> machines are physically quite close.   Network RAIDs have the potential
>> to provide redundancy when whole portions of the network infrastructure
>> are lost.  However, they do so at a significant cost in performance.
>>
>> Jeffrey Altman
>>
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>



-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord@MIT.EDU                        PGP key available