[OpenAFS] Migration and slow AFS performance

Jeffrey E Altman jaltman@auristor.com
Sat, 29 May 2021 23:16:53 -0400


This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--P1eFm1Zo3H2ANByG4BYkOtoSuqYy8DMVD
Content-Type: multipart/mixed; boundary="ayvVL9NqUX3OUa2tYdjthz6baGVG8Uhen";
 protected-headers="v1"
From: Jeffrey E Altman <jaltman@auristor.com>
To: "Daniel Mezhiborsky (daniel.mezhiborsky@cooper.edu)"
 <daniel.mezhiborsky@cooper.edu>,
 "openafs-info@openafs.org" <openafs-info@openafs.org>
Message-ID: <bef8aa6a-69d1-298d-1630-4030e05058ba@auristor.com>
Subject: Re: [OpenAFS] Migration and slow AFS performance
References: <MN2PR18MB2703B9173265F7FC0ECEEB74EC259@MN2PR18MB2703.namprd18.prod.outlook.com>
In-Reply-To: <MN2PR18MB2703B9173265F7FC0ECEEB74EC259@MN2PR18MB2703.namprd18.prod.outlook.com>

--ayvVL9NqUX3OUa2tYdjthz6baGVG8Uhen
Content-Type: multipart/mixed;
 boundary="------------515129422F0232049908B7D3"
Content-Language: en-US

This is a multi-part message in MIME format.
--------------515129422F0232049908B7D3
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable

Hi Dan,

Since no one from the OpenAFS community has replied I will chime in.

On 5/25/2021 10:21 AM, Daniel Mezhiborsky=20
(daniel.mezhiborsky@cooper.edu) wrote:
> Hello all,
>=20
> We currently have a relatively small (~250 users, 2TB) AFS cell that I =

> am planning on retiring soon.=20

If you are willing to explain, I'm sure the community would appreciate=20
hearing the reasons behind the migration away from OpenAFS and how the=20
replacement was selected.

> I'd like to get our data out of AFS-space,=20
> but this is proving problematic because of performance issues. Our setu=
p=20
> is one AFS server VM with iSCSI-attached /vicepa.=20

It would be helpful if you could provide more details of the server=20
configuration.

1. Is the client-fileserver traffic sharing the same network adapter as=20
the iSCSI attached filesystem backing the vice partition?

2. How much network bandwidth is available to the client-fileserver path?=


3. How much network bandwidth is available to the fileserver-iSCSI path?

4. What is the network latency (Round Trip Time) between the client and=20
fileserver?

5. What version is the OpenAFS fileserver?

6. What command line options has the fileserver been started with?

7. What AFS client version and operating system (and OS version) is the=20
client?

8. What are the configuration options in use by the client?

9. What is the destination file system that the rsync client is writing t=
o?

10. Is the destination filesystem also being accessed via the network?

11. What is the OpenAFS fileserver VM configuration?  number of cores,=20
amount of RAM, clock speed, etc.

> For single large-file=20
> transfers with encryption off, I can get around 30MB/s read/write, but =

> it seems like metadata operations are very slow, so copying our data=20
> (mostly home directories) directly from AFS with cp/rsync is taking a=20
> prohibitively long time.=20

Which metadata operations are slow?

Are you referring to operations reading the metadata from /afs or=20
writing it to the destination?

> I see similar slowness with vos dump. We do=20
> take regular backups with vos backupsys and backup dump that take about=
=20
> 36 hours for full backups.
>=20
> Does anyone have any recommendations on problem areas to try and tune t=
o=20
> get better performance or advice on a better way copy our data out of A=
FS?

You will need to provide more details about the environment before=20
specific recommendations can be provided.  However, I will mention a few =

things to consider about the extraction methodology.

First, each new AFS RPC behaves similarly to a new TCP connection.  It=20
starts with a minimal congestion window size and grows the window via a=20
slow start algorithm until either the maximum window size of 32 * 1412=20
packets has been reached or packet loss occurs.   Combined with the RTT=20
for the link you can compute the bandwitdh delay product for a single=20
call.

In general RPCs issued by an AFS client to read from a fileserver are=20
going to be one disk cache chunk size at a time.  Smaller if the average =

file size is less than the chunk size.

If authentication is required to access the directories and files, a=20
separate FetchStatus RPC will be issued for most files prior to the=20
first FetchData RPC.  At least in an OpenAFS client.

One of the strengths of AFS is the caching client, but in this case=20
caching is not beneficial because the data will be read once and=20
discarded during this workflow.   The workflow will also be painful for=20
an OpenAFS cache manager because of the callback tracking.   If the=20
cache recycles before the callback expires, the cache manager will=20
cancel the outstanding callbacks.

Likewise, if the fileserver callback table fills then it will need to=20
expend considerable effort searching for unexpired callbacks to discard=20
early.  Discarding callbacks requires issuing RPCs to the cache manager. =

  So an insufficiently large cache and callback pool can result in the=20
callback lifecycle dominating the workflow.

Instead of performing one rsync at a time.  You should consider=20
executing multiple rsyncs in parallel potentially from multiple client=20
systems.

Provide details of the environment and more specific recommendations can =

be provided.

Jeffrey Altman
AuriStor, Inc.


--------------515129422F0232049908B7D3
Content-Type: text/x-vcard; charset=utf-8;
 name="jaltman.vcf"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
 filename="jaltman.vcf"

begin:vcard
fn:Jeffrey Altman
n:Altman;Jeffrey
org:AuriStor, Inc.
adr:;;255 W 94TH ST STE 6B;New York;NY;10025-6985;United States
email;internet:jaltman@auristor.com
title:CEO
tel;work:+1-212-769-9018
url:https://www.linkedin.com/in/jeffreyaltman/
version:2.1
end:vcard


--------------515129422F0232049908B7D3--

--ayvVL9NqUX3OUa2tYdjthz6baGVG8Uhen--

--P1eFm1Zo3H2ANByG4BYkOtoSuqYy8DMVD
Content-Type: application/pgp-signature; name="OpenPGP_signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="OpenPGP_signature"

-----BEGIN PGP SIGNATURE-----

wsF5BAABCAAjFiEE+kRK8Zf0SbJM8+aZ93pzVZK2mgQFAmCzA6UFAwAAAAAACgkQ93pzVZK2mgSf
0RAApKOT6lKhgU9odh7CVQmKRSwuYm0Tmfrftj89mPKvNscEl4yINQXfCOZ3ud14/FD0dMhMOXWx
cvJzJjAHMcUc27gpGMIhuGXuJEukgC3X3noRIvDCm/bR01u35056V2SjocmUtGiCAGfXdkOWCWwl
xz58jBx43vbwpTwiooRSwDm9FE+C76No623s7dS+Ty+3sRUj1PzG3P+BSw1JKFzB7iVuWJPPCebQ
l7Nou9a95Xp+kRqwf1XxGIv3rXHWYXSOeqFwUuU+koVbc0ccnyehrhKbB3jTrdG/dADC/EDqcpq1
6KrMW8DFyQIijcft13Fup7KqYU6nMVUS4JvBEaG+cep2aDKOmGnBJ7Bnsf5+/SpsVYXo3T8xRCPy
7yrNpH6QpwtnsJDNyJk5PquSkMk3UPUV9QhTsRIdPOS2jLzskbXeHdrvtuTY+CNXE4+jXFd8XNm1
QHQ35arFVIp0xBEcU7JSRPABtShbi/npDY6eGaXb9lSAaEtD+MDxJ0cPsprEb9LrSj981iE7WnUt
OkG1zItsdlS+ANVBbCvBe4WuK3lXaAmPTdBwxAq9xBP0Mq4VX697DODLnsiF9nxiOgYrw09AGzx8
MhxPZH9EiLVVHyl+S9rTsGpa0+dzbFr62PhhVaLeUF+xe76vRrT6TSSdLScq/X5r8vV7xBj1DX+5
yis=
=wzQ4
-----END PGP SIGNATURE-----

--P1eFm1Zo3H2ANByG4BYkOtoSuqYy8DMVD--