[OpenAFS] robustness in face of server failures

Noel Yap noel.yap@gmail.com
Wed, 16 Nov 2005 10:39:43 -0800


On 11/16/05, Russ Allbery <rra@stanford.edu> wrote:
> Noel Yap <noel.yap@gmail.com> writes:
> I'd say that there are two potentially worrisome aspects to OpenAFS from =
a
> hard uptime requirement perspective:
>
>  * You want to be sure to be running the latest version, particularly on
>    Windows clients.  Older releases of the Windows client had various bug=
s
>    that could cause them to really hammer a file server.

I'm planning to use 1.4.

>  * For the most part, AFS fails independently, so that if a particular
>    file server goes down, everything else on other file servers is still
>    accessible.  However, if the AFS file server gets into a state where
>    it thinks it's still up but it can't answer client requests, clients
>    that try to access replicated volumes from that file server will hang
>    practically forever waiting for it rather than rolling over to another
>    replica site.  It would be very nice to have a fix for this.  In the
>    meantime, you really want your file servers to refuse UDP packets when
>    they're sick, which is something that you can rig up with some
>    monitoring and a local firewall.

What's been the typical causes of the server reaching this state?=20
Would you say that some of these have been addressed in 1.4?

Thanks,
Noel