[OpenAFS] Heartbeat + DRBD + OpenAFS, any suggestions?

Jeffrey Hutzelman jhutz@cmu.edu
Mon, 21 Mar 2005 17:36:24 -0500


On Saturday, March 19, 2005 11:22:44 PM -0500 Kyle Moffett 
<kmoffett@tjhsst.edu> wrote:

> We're currently deploying a highly-available pair of servers for users in
> one of our labs.  The servers use Heartbeat to automatically fail
> services  over from one server to the other when one dies.  All of the
> Kerberos and  OpenLDAP services are properly configured and working to
> automatically  promote one server from a read-only slave to a read-write
> master if the  other goes down.  We also have a working tested system to
> hard-reboot the  other box when it crashes or goes down improperly, so it
> does not make  changes while "down".  We have two DRBD volumes (RAID 1
> between 2  computers) configured between the two computers.  By default
> one volume is  mounted on the first server "king" on /vicepa, and volume
> 2 is mounted on  "emperor" on /vicepb.  If either server goes down, the
> volume will be  automatically mounted on the other server.  I believe
> OpenAFS can handle  adding and removing the volumes from each server
> dynamically like that,  even in the event of a server crash, but I am
> unsure if I need to prod the  voldb to get it to acknowledge the movement
> from one server to the other.  The volumes will _never_ be mounted on
> both servers at once, drbd and  heartbeat make sure of that.
>
> So, what should I have heartbeat run when it remounts a volume from one
> server to the other?  Also, I can tell when a server goes down hard or
> softly. Should I mark the volumes dirty somehow, or does OpenAFS do that
> for me?

There are a number of potential problems here...

- You will need to restart the fileserver in order to get it to notice
  a vice partition that was not there when it started.

- You will need to resync the VLDB against the "new" partition in order
  to get it to notice that the volumes have moved.  Use a command like
     # vos syncvldb -server emperor -partition /vicepa

- The bosserver can tell when the fileserver has not shut down cleanly,
  and will force a full salvage of all partitions in that case.  However,
  it cannot tell when a particular partition was last used on a fileserver
  that was shut down uncleanly.  So, if you want the salvage to happen,
  you will need to trigger it manually.  Note that a full-partition
  salvage will normally require shutting down the fileserver.


-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
   Sr. Research Systems Programmer
   School of Computer Science - Research Computing Facility
   Carnegie Mellon University - Pittsburgh, PA