[OpenAFS-devel] volserver tuning
Neulinger, Nathan
nneul@umr.edu
Wed, 30 Oct 2002 09:46:43 -0600
But if I've got 16 volserver threads, and 4-5 vol operations going on,
all potentially blocked against the file server - why should a "vos
partinfo" time out?
It seems like it's blocking ALL volserver activity, not just the
particular thread making a fssync request.=20
-- Nathan
------------------------------------------------------------
Nathan Neulinger EMail: nneul@umr.edu
University of Missouri - Rolla Phone: (573) 341-4841
Computing Services Fax: (573) 341-4216
> -----Original Message-----
> From: Todd_DeSantis@transarc.com [mailto:Todd_DeSantis@transarc.com]=20
> Sent: Wednesday, October 30, 2002 8:03 AM
> To: openafs-devel@openafs.org
> Subject: Re: [OpenAFS-devel] volserver tuning
>=20
>=20
> Hi Nathan - Hi Russ (and others):
>=20
> >> Recently I started more closely monitoring our volservers for
> >> responsiveness, especially during mass volume move and dump
> >> operations.
>=20
> >> I've noticed that during periods where volume actions are taking
> >> place, the volserver periodically hangs and doesn't seem to
> >> respond. Sometimes this occurs with only a few moves taking place.
>=20
> > Yup, we've been having the same problem; it causes a ton of problems
> > for volume releases not infrequently.
>=20
> I believe that both OpenAFS and IBM AFS have looked into this over the
> past few months. This bottleneck lies in the communication between
> the volserver and fileserver via the fssync calls. For certain
> calls/trnasactions, the volserver must contact the fileserver to do
> some actions. These actions are mainly
> - have the fileserver break callbacks to clients that have been
> using this volume.
>=20
> We have noticed that with the increase in PCs and laptops that=20
> travel between offices and home that the BreakCallback calls
> can fail/timeout because they are no longer on the network.
> This has caused the link between the fileserver and volserver
> to linger and cause problems with this call and with other=20
> transaction on this volserver.
>=20
> Derrick made changes to the OpenAFS fssync code to allow the
> fileserver to return control back to the volserver while the
> callbacks are being broken. I also think Rainer Toebbicke of CERN
> also made some changes in this area too. This can allow that
> initial volserver transaction to continue.
>=20
> However, I think that the fileserver only has 1 thread dedicated=20
> to listenting to requests from the volserver, so while the
> fileserver is still handling the BreakCallbacks request, other
> requests from the volserver are being blocked. It is possible
> that the CERN code addresses this and allows more fileserver
> threads to listen for volserver requests.
>=20
> The Transarc AFS code has also addressed some of these areas
> of contention. We allow the fileserver to return control to the
> volserver while it is breaking callbacks and we have also
> increased the number of threads that are available for the
> volserver requests. We have several sites running with these
> versions now and I have not heard of any problems.
>=20
> As a warning, I have suggested that sites be wary of scripts=20
> that will try to release a series of volumes one after the other.
> Since the "vos release" no longer has to wait for the fileserver
> to BreakCallbacks. the vos command finishes sooner and this could
> cause the next releases to hit the bottleneck at the fssync
> interface and fail.
>=20
> So having multiple "vos move" jobs running at the same=20
> time on this
> volserver/fileserver can also run into this problem.
>=20
> You can check the FileLog to see if you are seeing messages
> complaining about breaking volume callbacks to see if this is
> possibly the problem you are running up against.
>=20
> In my early days of supporting AFS, I always tried to tell customers
> to watch the number of simultaneous vos transactions that they send to
> the fileserver. These transactions are expensive at the IO level and
> the more transactions running at the same time can not only hurt
> volserver performance, but also fileserver performance. Since those
> days, we have changed the way the fileserver places volumes on the
> vice partitions so finding the next free inode to use is much faster
> and no longer as big performance bottleneck.
>=20
> But it is still recommended that we do not try to overload the
> volserver even though it does have the ability to run with 16
> threads. From past expereiences with customers, once 3 or 4 vos
> transactinos were active, performance did start to suffer.
>=20
> I'm probably mentioning things that you are already aware of, but I
> did want to throw this out there.
>=20
> Thanks
>=20
> Todd DeSantis
> AFS Support
>=20
>=20
>=20
>=20
>=20
>=20
>=20
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel
>=20