[OpenAFS] Heartbeat + DRBD + OpenAFS, any suggestions?
Jeffrey Hutzelman
jhutz@cmu.edu
Mon, 21 Mar 2005 17:36:24 -0500
On Saturday, March 19, 2005 11:22:44 PM -0500 Kyle Moffett
<kmoffett@tjhsst.edu> wrote:
> We're currently deploying a highly-available pair of servers for users in
> one of our labs. The servers use Heartbeat to automatically fail
> services over from one server to the other when one dies. All of the
> Kerberos and OpenLDAP services are properly configured and working to
> automatically promote one server from a read-only slave to a read-write
> master if the other goes down. We also have a working tested system to
> hard-reboot the other box when it crashes or goes down improperly, so it
> does not make changes while "down". We have two DRBD volumes (RAID 1
> between 2 computers) configured between the two computers. By default
> one volume is mounted on the first server "king" on /vicepa, and volume
> 2 is mounted on "emperor" on /vicepb. If either server goes down, the
> volume will be automatically mounted on the other server. I believe
> OpenAFS can handle adding and removing the volumes from each server
> dynamically like that, even in the event of a server crash, but I am
> unsure if I need to prod the voldb to get it to acknowledge the movement
> from one server to the other. The volumes will _never_ be mounted on
> both servers at once, drbd and heartbeat make sure of that.
>
> So, what should I have heartbeat run when it remounts a volume from one
> server to the other? Also, I can tell when a server goes down hard or
> softly. Should I mark the volumes dirty somehow, or does OpenAFS do that
> for me?
There are a number of potential problems here...
- You will need to restart the fileserver in order to get it to notice
a vice partition that was not there when it started.
- You will need to resync the VLDB against the "new" partition in order
to get it to notice that the volumes have moved. Use a command like
# vos syncvldb -server emperor -partition /vicepa
- The bosserver can tell when the fileserver has not shut down cleanly,
and will force a full salvage of all partitions in that case. However,
it cannot tell when a particular partition was last used on a fileserver
that was shut down uncleanly. So, if you want the salvage to happen,
you will need to trigger it manually. Note that a full-partition
salvage will normally require shutting down the fileserver.
-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
Sr. Research Systems Programmer
School of Computer Science - Research Computing Facility
Carnegie Mellon University - Pittsburgh, PA