[OpenAFS-devel] volserver tuning

Neulinger, Nathan nneul@umr.edu
Wed, 30 Oct 2002 09:46:43 -0600


But if I've got 16 volserver threads, and 4-5 vol operations going on,
all potentially blocked against the file server - why should a "vos
partinfo" time out?

It seems like it's blocking ALL volserver activity, not just the
particular thread making a fssync request.=20

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-4841
Computing Services                       Fax: (573) 341-4216


> -----Original Message-----
> From: Todd_DeSantis@transarc.com [mailto:Todd_DeSantis@transarc.com]=20
> Sent: Wednesday, October 30, 2002 8:03 AM
> To: openafs-devel@openafs.org
> Subject: Re: [OpenAFS-devel] volserver tuning
>=20
>=20
> Hi Nathan - Hi Russ (and others):
>=20
> >> Recently I started more closely monitoring our volservers for
> >> responsiveness, especially during mass volume move and dump
> >> operations.
>=20
> >> I've noticed that during periods where volume actions are taking
> >> place, the volserver periodically hangs and doesn't seem to
> >> respond. Sometimes this occurs with only a few moves taking place.
>=20
> > Yup, we've been having the same problem; it causes a ton of problems
> > for volume releases not infrequently.
>=20
> I believe that both OpenAFS and IBM AFS have looked into this over the
> past few months.  This bottleneck lies in the communication between
> the volserver and fileserver via the fssync calls.  For certain
> calls/trnasactions, the volserver must contact the fileserver to do
> some actions.  These actions are mainly
>   - have the fileserver break callbacks to clients that have been
>     using this volume.
>=20
>     We have noticed that with the increase in PCs and laptops that=20
>     travel between offices and home that the BreakCallback calls
>     can fail/timeout because they are no longer on the network.
>     This has caused the link between the fileserver and volserver
>     to linger and cause problems with this call and with other=20
>     transaction on this volserver.
>=20
>     Derrick made changes to the OpenAFS fssync code to allow the
>     fileserver to return control back to the volserver while the
>     callbacks are being broken.  I also think Rainer Toebbicke of CERN
>     also made some changes in this area too.  This can allow that
>     initial volserver transaction to continue.
>=20
>     However, I think that the fileserver only has 1 thread dedicated=20
>     to listenting to requests from the volserver, so while the
>     fileserver is still handling the BreakCallbacks request, other
>     requests from the volserver are being blocked.  It is possible
>     that the CERN code addresses this and allows more fileserver
>     threads to listen for volserver requests.
>=20
>     The Transarc AFS code has also addressed some of these areas
>     of contention.  We allow the fileserver to return control to the
>     volserver while it is breaking callbacks and we have also
>     increased the number of threads that are available for the
>     volserver requests.  We have several sites running with these
>     versions now and I have not heard of any problems.
>=20
>     As a warning, I have suggested that sites be wary of scripts=20
>     that will try to release a series of volumes one after the other.
>     Since the "vos release" no longer has to wait for the fileserver
>     to BreakCallbacks. the vos command finishes sooner and this could
>     cause the next releases to hit the bottleneck at the fssync
>     interface and fail.
>=20
>     So having multiple "vos move" jobs running at the same=20
> time on this
>     volserver/fileserver can also run into this problem.
>=20
>     You can check the FileLog to see if you are seeing messages
>     complaining about breaking volume callbacks to see if this is
>     possibly the problem you are running up against.
>=20
> In my early days of supporting AFS, I always tried to tell customers
> to watch the number of simultaneous vos transactions that they send to
> the fileserver.  These transactions are expensive at the IO level and
> the more transactions running at the same time can not only hurt
> volserver performance, but also fileserver performance.  Since those
> days, we have changed the way the fileserver places volumes on the
> vice partitions so finding the next free inode to use is much faster
> and no longer as big performance bottleneck.
>=20
> But it is still recommended that we do not try to overload the
> volserver even though it does have the ability to run with 16
> threads.  From past expereiences with customers, once 3 or 4 vos
> transactinos were active, performance did start to suffer.
>=20
> I'm probably mentioning things that you are already aware of, but I
> did want to throw this out there.
>=20
> Thanks
>=20
> Todd DeSantis
> AFS Support
>=20
>=20
>=20
>=20
>=20
>=20
>=20
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel
>=20