[OpenAFS] robustness in face of server failures
Wed, 16 Nov 2005 16:21:00 -0800
One last question (for now): Have problems been seen in read-only
volumes (if I'm using the terminology correctly)?
On 11/16/05, Russ Allbery <firstname.lastname@example.org> wrote:
> Noel Yap <email@example.com> writes:
> > On 11/16/05, Russ Allbery <firstname.lastname@example.org> wrote:
> >> * For the most part, AFS fails independently, so that if a particular
> >> file server goes down, everything else on other file servers is sti=
> >> accessible. However, if the AFS file server gets into a state wher=
> >> it thinks it's still up but it can't answer client requests, client=
> >> that try to access replicated volumes from that file server will ha=
> >> practically forever waiting for it rather than rolling over to anot=
> >> replica site. It would be very nice to have a fix for this. In th=
> >> meantime, you really want your file servers to refuse UDP packets w=
> >> they're sick, which is something that you can rig up with some
> >> monitoring and a local firewall.
> > What's been the typical causes of the server reaching this state?
> > Would you say that some of these have been addressed in 1.4?
> Yes. Most of the causes have been fixed via other means (such as clients
> with asymmetric firewalls, older Windows clients, etc.). Usually this is
> caused by an extreme burst of activity that overloads the server. It's
> very difficult to do this with just normal traffic; it usually takes some
> sort of bug on top of that to overwhelm the server.
> Russ Allbery (email@example.com) <http://www.eyrie.org/~eagle/=
> OpenAFS-info mailing list