[OpenAFS] OpenAFS with RAID

Stephan Wiesand Stephan.Wiesand@desy.de
Wed, 28 Dec 2005 19:45:05 +0100 (CET)


Having been tossing ideas like this for a while myself:

On Wed, 28 Dec 2005, Chaskiel M Grundman wrote:

> --On Wednesday, December 28, 2005 03:39:34 PM +0000 Paul Robins 
> <paul@paulrobins.alldaypa.com> wrote:
>
>> If either of you could weigh in on AFS on top of DRBD i'd appreciate it,
>> I'm not fully up on whether a second server with an identical filesystem
>> could be made to take over a crashed AFS machine.
>
> There are a couple of issues
>
> 1) would DRBD actually notice that the storage device on the primary node is 
> "hanging" and switch over to the secondary? I didn't think that most 
> heartbeat services would catch this
> 2) there would be a significant delay bringing up the secondary node as a 
> fileserver: Since the volumes were likely to all be "attached" (in use) by 
> the primary node's fileserver at the time of the failover, the DRBD partition 
> would need to be salvaged (a secondary fsck of the afs metadata) before the 
> fileserver could be started on the secondary machine.
> 3) you would need to do some sort of IP address takeover in order for clients 
> to contact the correct machine to get at the data. (The AFS architecture 
> provides better ways to do this, but the tools for using them in a case like 
> this aren't there at the moment)

Wouldn't it be an option to not take over the IP address, but just the 
vice partition? Once failure of the peer is recognized and confirmed 
(which is a problem, I agree, but not at all AFS-specific):

   1) stonith
   2) mount the new vice partition and salvage it
      [ 2a) is there a need to restart the fileserver? ]
   3) vos syncvldb

That way, one could also have fileservers running on both failover nodes 
as long as both are working. Looks like an active-active, shared-nothing,
failover solution to me, and pretty attractive at least for "mostly read" 
data where drbd would not hamper performance too much.

After a few minutes (when clients find out the volume is located on a 
different server now, and that time might well be sufficient for fsck and 
salvager) life should go on. If it happens Friday evening, with no expert 
at hand before monday morning, that's huge win.

There must be some thinko in all this, or people would be doing this a 
lot. What is it I'm overlooking?

Thanks,
  	Stephan

-- 

   ----------------------------------------------------
| Stephan Wiesand  |                                |
|                  |                                |
| DESY     - DV -  | phone  +49 33762 7 7370        |
| Platanenallee 6  | fax    +49 33762 7 7216        |
| 15738 Zeuthen    |                                |
| Germany          |                                |
   ----------------------------------------------------