[OpenAFS] DB servers seperate from fileservers

Esther Filderman mizmoose@gmail.com
Tue, 8 Aug 2006 10:54:09 -0400

On 8/7/06, Christopher D. Clausen <cclausen@acm.org> wrote:
> Umm, am I missing something?  One of the major reasons I use AFS is the
> "vos move" command.  And it was my understanding that AFS can handle
> server outages without breaking.  Do you all have different experiences?
> If AFS can't handle a server outage (especially a planned one) there is
> no point in using it.

Don't be silly.  No system can handle all outages "without breaking."
 RO replication is great, but it doesn't help users.

I have three machines, A, B, C.  Users distributed among them.
Machine B coughs up a lung due to hardware failure.  Suddenly 1/3 of
my users don't have accounts.

"vos move" isn't going to help volumes that aren't there.

I mean, there were [and are] things we do to try to limit the downtime
-- hot spare hardware,  RAID-5 disks,  and we improved the ability to
plug a RAID set into an existing server and getting it going asap [we
named the AFS partitions on each machine differently so there won't be
conflicts with, for example, two partitions called /vicepa].

But in the end, hardware failure is hardware failure and there's
nothing you can do to stop it.

> I patch and reboot all of our AFS servers about once a month to ensure
> that they have the latest operating system patches.  I usually also
> upgrade to the latest 1.4.x release (just installed 1.4.2b3 on a system
> today.)

> I also run with fast-restart.  Have not had any reported problems with
> volumes crapping out.  And I generally vos move eveything off of a
> fileserver before planned restarts, so there is nothing there for the
> salvager to keep offline.

Eventually volumes will kick offline if the fileserver detects they're
damaged and in need of a salvage.  Worse, sometimes the fileserver
hasn't yet figured out and the users get freaked out because files
seem to be "missing".

Salvages are *important* to the integrity of AFS volumes, just like
fsck is important to (non-journaled) disks.

> > We're starting a routine of monthly salvages for each server to try to
> > combat this.
> Do salvages touch the volumes themselves, or is it just a parition level
> thing?  I.e. if I vos move volumes off of the paritions and mkfs them
> monthly, do I still need to worry about salvaging periodically?

YES!  The salvager is talking to the volumes themselves, checking
actual structure.  It tries to put things back together when it can.

> Oh yes.  I don't run anything else on my AFS servers or KDCs.  I'd hate
> to see a flaw in openafs compromise a KDC and thus I keep them seperate.
> Although our (currently non-existant) DR plans might have a KDC and AFS
> server on the same machine, possibly in a Solaris zone.

I am far less worried about OpenAFS comprimising my servers than all
the other cruft out there.