[OpenAFS] DB servers seperate from fileservers
Christopher D. Clausen
cclausen@acm.org
Tue, 8 Aug 2006 10:39:00 -0500
Esther Filderman <mizmoose@gmail.com> wrote:
> On 8/7/06, Christopher D. Clausen <cclausen@acm.org> wrote:
>> Umm, am I missing something? One of the major reasons I use AFS is
>> the "vos move" command. And it was my understanding that AFS can
>> handle server outages without breaking. Do you all have different
>> experiences? If AFS can't handle a server outage (especially a
>> planned one) there is no point in using it.
>
> Don't be silly. No system can handle all outages "without breaking."
> RO replication is great, but it doesn't help users.
I was specifically talking about DB servers. Having one of them go
down, provided there are no volumes on that server, should not cause a
problem, right?
> But in the end, hardware failure is hardware failure and there's
> nothing you can do to stop it.
Oh, yes. But for planned upgrades and such, it should be possible to
avoid serious problems.
>> I also run with fast-restart. Have not had any reported problems
>> with volumes crapping out. And I generally vos move eveything off
>> of a fileserver before planned restarts, so there is nothing there
>> for the salvager to keep offline.
>
> Eventually volumes will kick offline if the fileserver detects they're
> damaged and in need of a salvage. Worse, sometimes the fileserver
> hasn't yet figured out and the users get freaked out because files
> seem to be "missing".
Hmm... haven't had anything disappear yet. Maybe I'm just lucky.
Is there something fancy that the salvager does that can't be done
during a volume move, or during a dump and restore? I would think that
re-writing a volume on another server could allow one of the volume
clones to be checked and/or fixed before it is brought back online.
Having servers (and more importantly volumes) down for hours while
volumes are salvaged doesn't seem ideal. Or am I dreaming?
<<CDC