[OpenAFS] robustness in face of server failures
Noel Yap
noel.yap@gmail.com
Wed, 16 Nov 2005 10:39:43 -0800
On 11/16/05, Russ Allbery <rra@stanford.edu> wrote:
> Noel Yap <noel.yap@gmail.com> writes:
> I'd say that there are two potentially worrisome aspects to OpenAFS from =
a
> hard uptime requirement perspective:
>
> * You want to be sure to be running the latest version, particularly on
> Windows clients. Older releases of the Windows client had various bug=
s
> that could cause them to really hammer a file server.
I'm planning to use 1.4.
> * For the most part, AFS fails independently, so that if a particular
> file server goes down, everything else on other file servers is still
> accessible. However, if the AFS file server gets into a state where
> it thinks it's still up but it can't answer client requests, clients
> that try to access replicated volumes from that file server will hang
> practically forever waiting for it rather than rolling over to another
> replica site. It would be very nice to have a fix for this. In the
> meantime, you really want your file servers to refuse UDP packets when
> they're sick, which is something that you can rig up with some
> monitoring and a local firewall.
What's been the typical causes of the server reaching this state?=20
Would you say that some of these have been addressed in 1.4?
Thanks,
Noel