[OpenAFS] DB servers seperate from fileservers

Christopher D. Clausen cclausen@acm.org
Tue, 8 Aug 2006 10:39:00 -0500


Esther Filderman <mizmoose@gmail.com> wrote:
> On 8/7/06, Christopher D. Clausen <cclausen@acm.org> wrote:
>> Umm, am I missing something?  One of the major reasons I use AFS is
>> the "vos move" command.  And it was my understanding that AFS can
>> handle server outages without breaking.  Do you all have different
>> experiences? If AFS can't handle a server outage (especially a
>> planned one) there is no point in using it.
>
> Don't be silly.  No system can handle all outages "without breaking."
> RO replication is great, but it doesn't help users.

I was specifically talking about DB servers.  Having one of them go 
down, provided there are no volumes on that server, should not cause a 
problem, right?

> But in the end, hardware failure is hardware failure and there's
> nothing you can do to stop it.

Oh, yes.  But for planned upgrades and such, it should be possible to 
avoid serious problems.

>> I also run with fast-restart.  Have not had any reported problems
>> with volumes crapping out.  And I generally vos move eveything off
>> of a fileserver before planned restarts, so there is nothing there
>> for the salvager to keep offline.
>
> Eventually volumes will kick offline if the fileserver detects they're
> damaged and in need of a salvage.  Worse, sometimes the fileserver
> hasn't yet figured out and the users get freaked out because files
> seem to be "missing".

Hmm...  haven't had anything disappear yet.  Maybe I'm just lucky.

Is there something fancy that the salvager does that can't be done 
during a volume move, or during a dump and restore?  I would think that 
re-writing a volume on another server could allow one of the volume 
clones to be checked and/or fixed before it is brought back online. 
Having servers (and more importantly volumes) down for hours while 
volumes are salvaged doesn't seem ideal.  Or am I dreaming?

<<CDC