[OpenAFS] robustness in face of server failures
Noel Yap
noel.yap@gmail.com
Wed, 16 Nov 2005 16:21:00 -0800
One last question (for now): Have problems been seen in read-only
volumes (if I'm using the terminology correctly)?
Thanks,
Noel
On 11/16/05, Russ Allbery <rra@stanford.edu> wrote:
> Noel Yap <noel.yap@gmail.com> writes:
> > On 11/16/05, Russ Allbery <rra@stanford.edu> wrote:
>
> >> * For the most part, AFS fails independently, so that if a particular
> >> file server goes down, everything else on other file servers is sti=
ll
> >> accessible. However, if the AFS file server gets into a state wher=
e
> >> it thinks it's still up but it can't answer client requests, client=
s
> >> that try to access replicated volumes from that file server will ha=
ng
> >> practically forever waiting for it rather than rolling over to anot=
her
> >> replica site. It would be very nice to have a fix for this. In th=
e
> >> meantime, you really want your file servers to refuse UDP packets w=
hen
> >> they're sick, which is something that you can rig up with some
> >> monitoring and a local firewall.
>
> > What's been the typical causes of the server reaching this state?
> > Would you say that some of these have been addressed in 1.4?
>
> Yes. Most of the causes have been fixed via other means (such as clients
> with asymmetric firewalls, older Windows clients, etc.). Usually this is
> caused by an extreme burst of activity that overloads the server. It's
> very difficult to do this with just normal traffic; it usually takes some
> sort of bug on top of that to overwhelm the server.
>
> --
> Russ Allbery (rra@stanford.edu) <http://www.eyrie.org/~eagle/=
>
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>