[OpenAFS] robustness in face of server failures
Russ Allbery
rra@stanford.edu
Wed, 16 Nov 2005 10:31:44 -0800
Noel Yap <noel.yap@gmail.com> writes:
> I'm investigating whether or not OpenAFS would be a good solution for
> our needs. One requirement is that the chances of catastrophic
> failure (eg the network goes down) ought to be minimal (~once every
> few years or less). What has been peoples' experiences with this? I
> know 1.4 hasn't been out that long, but has anyone noticed any good or
> bad things about it?
I'd say that there are two potentially worrisome aspects to OpenAFS from a
hard uptime requirement perspective:
* You want to be sure to be running the latest version, particularly on
Windows clients. Older releases of the Windows client had various bugs
that could cause them to really hammer a file server.
* For the most part, AFS fails independently, so that if a particular
file server goes down, everything else on other file servers is still
accessible. However, if the AFS file server gets into a state where
it thinks it's still up but it can't answer client requests, clients
that try to access replicated volumes from that file server will hang
practically forever waiting for it rather than rolling over to another
replica site. It would be very nice to have a fix for this. In the
meantime, you really want your file servers to refuse UDP packets when
they're sick, which is something that you can rig up with some
monitoring and a local firewall.
AFS is, in general, extremely stable apart from those two issues.
--
Russ Allbery (rra@stanford.edu) <http://www.eyrie.org/~eagle/>