[OpenAFS] DB servers seperate from fileservers
Esther Filderman
mizmoose@gmail.com
Tue, 8 Aug 2006 10:54:09 -0400
On 8/7/06, Christopher D. Clausen <cclausen@acm.org> wrote:
> Umm, am I missing something? One of the major reasons I use AFS is the
> "vos move" command. And it was my understanding that AFS can handle
> server outages without breaking. Do you all have different experiences?
> If AFS can't handle a server outage (especially a planned one) there is
> no point in using it.
Don't be silly. No system can handle all outages "without breaking."
RO replication is great, but it doesn't help users.
I have three machines, A, B, C. Users distributed among them.
Machine B coughs up a lung due to hardware failure. Suddenly 1/3 of
my users don't have accounts.
"vos move" isn't going to help volumes that aren't there.
I mean, there were [and are] things we do to try to limit the downtime
-- hot spare hardware, RAID-5 disks, and we improved the ability to
plug a RAID set into an existing server and getting it going asap [we
named the AFS partitions on each machine differently so there won't be
conflicts with, for example, two partitions called /vicepa].
But in the end, hardware failure is hardware failure and there's
nothing you can do to stop it.
> I patch and reboot all of our AFS servers about once a month to ensure
> that they have the latest operating system patches. I usually also
> upgrade to the latest 1.4.x release (just installed 1.4.2b3 on a system
> today.)
>
> I also run with fast-restart. Have not had any reported problems with
> volumes crapping out. And I generally vos move eveything off of a
> fileserver before planned restarts, so there is nothing there for the
> salvager to keep offline.
Eventually volumes will kick offline if the fileserver detects they're
damaged and in need of a salvage. Worse, sometimes the fileserver
hasn't yet figured out and the users get freaked out because files
seem to be "missing".
Salvages are *important* to the integrity of AFS volumes, just like
fsck is important to (non-journaled) disks.
> > We're starting a routine of monthly salvages for each server to try to
> > combat this.
>
> Do salvages touch the volumes themselves, or is it just a parition level
> thing? I.e. if I vos move volumes off of the paritions and mkfs them
> monthly, do I still need to worry about salvaging periodically?
YES! The salvager is talking to the volumes themselves, checking
actual structure. It tries to put things back together when it can.
> Oh yes. I don't run anything else on my AFS servers or KDCs. I'd hate
> to see a flaw in openafs compromise a KDC and thus I keep them seperate.
> Although our (currently non-existant) DR plans might have a KDC and AFS
> server on the same machine, possibly in a Solaris zone.
I am far less worried about OpenAFS comprimising my servers than all
the other cruft out there.