[OpenAFS] Re: redundant home directory volumes

Patrick J. LoPresti patl@curl.com
28 Jun 2002 14:31:46 -0400


"J. Maynard Gelinas" <gelinas@lns.mit.edu> writes:

> >   I've got a question about implementing some redundancy among
> > home directories. Suppose I have a small WAN configuration with
> > file / database servers local to each switch and LAN
> > community. Even given a RAID array on each file server, systems
> > and RAID arrays do crash. So, what is the best method for creating
> > some volume redundancy across the cell for dynamic RW volumes?

Yes, RAID arrays can crash.  But there are ways to have
fully-redundant services with no single point of failure:

  http://www.redhat.com/docs/manuals/advserver/RHLAS-2.1-Manual/cluster-manager/

The basic idea is to have two servers which *share* a fully redundant
disk array.  A service (like Apache or Samba) runs on just one of
them, say server A.  Server B monitors server A for "heartbeat"
failure.  When server A fails, server B:

  1) *Cuts power* to server A (to avoid race conditions)

  2) Takes over the IP address which A was using

  3) Starts the service

  4) Restores power to server A (optional)

>From a client's point of view, the server has simply crashed and
rebooted really fast.

I see no reason, in principle, why this scheme could not be adapted
for AFS, especially since the AFS file server is a user-space process.
Yes, clients would have to wait for a salvage to finish when server B
took over, and the AFS server code might need some tweaking to handle
the IP address trick, but overall I think it would be pretty simple.
I could be missing something, of course :-).

I wonder whether anybody has tried something like this with AFS.  It
strikes me as a fairly neat idea.

 - Pat