[OpenAFS] robustness in face of server failures

Noel Yap noel.yap@gmail.com
Wed, 16 Nov 2005 16:21:00 -0800


One last question (for now):  Have problems been seen in read-only
volumes (if I'm using the terminology correctly)?

Thanks,
Noel

On 11/16/05, Russ Allbery <rra@stanford.edu> wrote:
> Noel Yap <noel.yap@gmail.com> writes:
> > On 11/16/05, Russ Allbery <rra@stanford.edu> wrote:
>
> >>  * For the most part, AFS fails independently, so that if a particular
> >>    file server goes down, everything else on other file servers is sti=
ll
> >>    accessible.  However, if the AFS file server gets into a state wher=
e
> >>    it thinks it's still up but it can't answer client requests, client=
s
> >>    that try to access replicated volumes from that file server will ha=
ng
> >>    practically forever waiting for it rather than rolling over to anot=
her
> >>    replica site.  It would be very nice to have a fix for this.  In th=
e
> >>    meantime, you really want your file servers to refuse UDP packets w=
hen
> >>    they're sick, which is something that you can rig up with some
> >>    monitoring and a local firewall.
>
> > What's been the typical causes of the server reaching this state?
> > Would you say that some of these have been addressed in 1.4?
>
> Yes.  Most of the causes have been fixed via other means (such as clients
> with asymmetric firewalls, older Windows clients, etc.).  Usually this is
> caused by an extreme burst of activity that overloads the server.  It's
> very difficult to do this with just normal traffic; it usually takes some
> sort of bug on top of that to overwhelm the server.
>
> --
> Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/=
>
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>