[OpenAFS] Migration and slow AFS performance

Jeffrey E Altman jaltman@auristor.com
Sat, 29 May 2021 23:16:53 -0400

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
Content-Type: multipart/mixed; boundary="ayvVL9NqUX3OUa2tYdjthz6baGVG8Uhen";
From: Jeffrey E Altman <jaltman@auristor.com>
To: "Daniel Mezhiborsky (daniel.mezhiborsky@cooper.edu)"
 "openafs-info@openafs.org" <openafs-info@openafs.org>
Message-ID: <bef8aa6a-69d1-298d-1630-4030e05058ba@auristor.com>
Subject: Re: [OpenAFS] Migration and slow AFS performance
References: <MN2PR18MB2703B9173265F7FC0ECEEB74EC259@MN2PR18MB2703.namprd18.prod.outlook.com>
In-Reply-To: <MN2PR18MB2703B9173265F7FC0ECEEB74EC259@MN2PR18MB2703.namprd18.prod.outlook.com>

Content-Type: multipart/mixed;
Content-Language: en-US

This is a multi-part message in MIME format.
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable

Hi Dan,

Since no one from the OpenAFS community has replied I will chime in.

On 5/25/2021 10:21 AM, Daniel Mezhiborsky=20
(daniel.mezhiborsky@cooper.edu) wrote:
> Hello all,
> We currently have a relatively small (~250 users, 2TB) AFS cell that I =

> am planning on retiring soon.=20

If you are willing to explain, I'm sure the community would appreciate=20
hearing the reasons behind the migration away from OpenAFS and how the=20
replacement was selected.

> I'd like to get our data out of AFS-space,=20
> but this is proving problematic because of performance issues. Our setu=
> is one AFS server VM with iSCSI-attached /vicepa.=20

It would be helpful if you could provide more details of the server=20

1. Is the client-fileserver traffic sharing the same network adapter as=20
the iSCSI attached filesystem backing the vice partition?

2. How much network bandwidth is available to the client-fileserver path?=

3. How much network bandwidth is available to the fileserver-iSCSI path?

4. What is the network latency (Round Trip Time) between the client and=20

5. What version is the OpenAFS fileserver?

6. What command line options has the fileserver been started with?

7. What AFS client version and operating system (and OS version) is the=20

8. What are the configuration options in use by the client?

9. What is the destination file system that the rsync client is writing t=

10. Is the destination filesystem also being accessed via the network?

11. What is the OpenAFS fileserver VM configuration?  number of cores,=20
amount of RAM, clock speed, etc.

> For single large-file=20
> transfers with encryption off, I can get around 30MB/s read/write, but =

> it seems like metadata operations are very slow, so copying our data=20
> (mostly home directories) directly from AFS with cp/rsync is taking a=20
> prohibitively long time.=20

Which metadata operations are slow?

Are you referring to operations reading the metadata from /afs or=20
writing it to the destination?

> I see similar slowness with vos dump. We do=20
> take regular backups with vos backupsys and backup dump that take about=
> 36 hours for full backups.
> Does anyone have any recommendations on problem areas to try and tune t=
> get better performance or advice on a better way copy our data out of A=

You will need to provide more details about the environment before=20
specific recommendations can be provided.  However, I will mention a few =

things to consider about the extraction methodology.

First, each new AFS RPC behaves similarly to a new TCP connection.  It=20
starts with a minimal congestion window size and grows the window via a=20
slow start algorithm until either the maximum window size of 32 * 1412=20
packets has been reached or packet loss occurs.   Combined with the RTT=20
for the link you can compute the bandwitdh delay product for a single=20

In general RPCs issued by an AFS client to read from a fileserver are=20
going to be one disk cache chunk size at a time.  Smaller if the average =

file size is less than the chunk size.

If authentication is required to access the directories and files, a=20
separate FetchStatus RPC will be issued for most files prior to the=20
first FetchData RPC.  At least in an OpenAFS client.

One of the strengths of AFS is the caching client, but in this case=20
caching is not beneficial because the data will be read once and=20
discarded during this workflow.   The workflow will also be painful for=20
an OpenAFS cache manager because of the callback tracking.   If the=20
cache recycles before the callback expires, the cache manager will=20
cancel the outstanding callbacks.

Likewise, if the fileserver callback table fills then it will need to=20
expend considerable effort searching for unexpired callbacks to discard=20
early.  Discarding callbacks requires issuing RPCs to the cache manager. =

  So an insufficiently large cache and callback pool can result in the=20
callback lifecycle dominating the workflow.

Instead of performing one rsync at a time.  You should consider=20
executing multiple rsyncs in parallel potentially from multiple client=20

Provide details of the environment and more specific recommendations can =

be provided.

Jeffrey Altman
AuriStor, Inc.

Content-Type: text/x-vcard; charset=utf-8;
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;

fn:Jeffrey Altman
org:AuriStor, Inc.
adr:;;255 W 94TH ST STE 6B;New York;NY;10025-6985;United States



Content-Type: application/pgp-signature; name="OpenPGP_signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="OpenPGP_signature"