[OpenAFS] DB servers seperate from fileservers

Esther Filderman mizmoose@gmail.com
Mon, 7 Aug 2006 12:46:24 -0400


On 8/7/06, John Hascall <john@iastate.edu> wrote:

>
>   1) Stability.  The uptime of our DB servers is years,
>      we can only dream of that for our fileservers.

I'm currently running a mix.  My primary KDC and 'lowest IP" DB server
is a non-fileserver machine.   The other two boxes do both.

In addition to uptime, we also have the added stability of being able
to take down the KDC without interrupting volume access.  This is very
very nice.

>   2) Restart speed.  Waiting for a pile of disk to
>      fsck to get your DB servers up and running again
>      is suboptimal.

Again, having one machine as a DB-non-fileserver helps this greatly.

We also run with --fast-restart compiled in. This is a pushme-pullyou.
 Basically all fast-restart does is skip the salvaging.  Now we have
volumes crapping themselves here and there.  [Thank you, Fortran, you
%*%()#.  Ahem.]

We're starting a routine of monthly salvages for each server to try to
combat this.

>   3) Load. A busy fileserver on the same machine as your
>      DB server can slow your whole cell.

Cannot argue with this.

>   4) Simplicity.  When something is amiss with a machine,
>      the less things a machine is doing, the less things
>      to check and the less likely it is the result of
>      some wierd interaction.

This is also why I advocate turning off everything else possible on an
AFS server.  No AFS client.  Turn off everything you can.    Outside
of AFS's own ports we have ntp and scp/ssh allowed in & out and that's
about it.

> Reasons for joining them would be (in my mind):
>
>   1) Cost.  Fewer machines == Less cost
>      (however, you can easily run the DB servers
>       low-cost, even hand-me-down boxes).

My current DB-non-fileserver box was plucked out of the garbage.  I'm serious.

I'm actually soon going to get a "real" machine to replace it with,
but when I wanted to do this and kept getting spurned, I went to the
hardware guys and said, "Throw crap together."  I should take a
picture of the machine, it seriously looks like something a 14 yr old
with no money put together after dumpster diving.

I'm so proud.

>   2) Space, power, cooling.  Either you have these or you don't.
>
>   3) You got a really small cell, so it doesn't matter.

Argueably I have, well, a mid-sized cell.  I'm supporting a fairly
small number of frequently active users [maybe 250 on a good day],
maybe 2000 total real users.  I don't think I've cracked 1T in used
space yet.  A sizeable chunk of my volumes are stuffed with research
databases and videos.

Yet I find that the more servers you have the more stable you are.
The more machines you are the less one machine's impact is felt.

My cell used to be three machines, all DB & fileservers together,
about 300G in use.  When one machine went down 1/3 of the cell was
inaccessible. TOTAL MESS.

Now I have 5 machines.  Not as good as I'd like, but still muuuuch more stable.

e.