[OpenAFS] sudden crash

Neulinger, Nathan nneul@umr.edu
Thu, 1 Aug 2002 08:24:30 -0500


That's an interesting point... I don't currently have that on our NT
stations, so just unix, and that's about 350 machines, but I'll give
some thought to whether we want to keep it.=20

If we could figure out why the client doesn't always notice the failure,
that would eliminate my reason for lowering these intervals.=20

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-4841
Computing Services                       Fax: (573) 341-4216


> -----Original Message-----
> From: Derek Atkins [mailto:warlord@MIT.EDU]=20
> Sent: Thursday, August 01, 2002 8:06 AM
> To: Neulinger, Nathan
> Cc: Turbo Fredriksson; openafs-info@openafs.org
> Subject: Re: [OpenAFS] sudden crash
>=20
>=20
> Note that with a probe interval of 30 seconds and 10,000 clients you
> now have an average probe rate of approximately 333 packets per
> second.  The fileserver needs to respond to this load before you even
> consider your actual file service load for data packets.
>=20
> Are you sure this is what you want?  At the 10 minute interval you
> get a more decent 16 probes/second.
>=20
> Think HARD before you really change these timeouts.  Note that users
> can always type "fs checks -all" to force the probes.
>=20
> -derek
>=20
> Nathan Neulinger <nneul@umr.edu> writes:
>=20
> > I build with:
> >=20
> > CFLAGS=3D"${CFLAGS} -DAFS_RXDEADTIME=3D10 =
-DDEFAULT_PROBE_INTERVAL=3D30"
> > MT_CFLAGS=3D"${MT_CFLAGS} -DAFS_RXDEADTIME=3D10=20
> -DDEFAULT_PROBE_INTERVAL=3D30"
> > XCFLAGS=3D"${XCFLAGS} -DAFS_RXDEADTIME=3D10 =
-DDEFAULT_PROBE_INTERVAL=3D30"
> >=20
> > If the client detected the server failure, then it wouldn't=20
> come apart.
> > The problem is that in some types of server failures, the=20
> client doesn't
> > see it, and never switches to alternate sources.
> >=20
> > -- Nathan
> >=20
> > On Thu, 2002-08-01 at 03:37, Turbo Fredriksson wrote:
> > > >>>>> "Neulinger" =3D=3D Neulinger, Nathan <nneul@umr.edu> writes:
> > >=20
> > >     Neulinger> Nope. And in fact, recovery when servers=20
> come back is
> > >     Neulinger> usually instantaneous, except if the server outage
> > >     Neulinger> killed the machine. (I.e.  running web out of afs
> > >     Neulinger> resulting in thousands of hung requests). In those
> > >     Neulinger> cases, the client machine doesn't usually recover.
> > >=20
> > > Nice! What value have you put there? Would 10 seconds be to small
> > > a value? And I assume that the client wouldn't survive even if
> > > the default values where used.
> > >=20
> > > I'm just wondering why the default is so high - 50 seconds - and
> > > not something 'reasonable' like 10-15 sec... ?
> > > --=20
> > > tritium Qaddafi Clinton Honduras radar 747 president=20
> security Waco,
> > > Texas Iran Ft. Bragg cracking nitrate Cocaine strategic
> > > [See http://www.aclu.org/echelonwatch/index.html for more=20
> about this]
> > > _______________________________________________
> > > OpenAFS-info mailing list
> > > OpenAFS-info@openafs.org
> > > https://lists.openafs.org/mailman/listinfo/openafs-info
> > --=20
> >=20
> > ------------------------------------------------------------
> > Nathan Neulinger                       EMail:  nneul@umr.edu
> > University of Missouri - Rolla         Phone: (573) 341-4841
> > Computing Services                       Fax: (573) 341-4216
> >=20
> > _______________________________________________
> > OpenAFS-info mailing list
> > OpenAFS-info@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-info
>=20
> --=20
>        Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
>        Member, MIT Student Information Processing Board  (SIPB)
>        URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
>        warlord@MIT.EDU                        PGP key available
>=20