[OpenAFS] Migration and slow AFS performance
Jeffrey E Altman
jaltman@auristor.com
Sat, 29 May 2021 23:16:53 -0400
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--P1eFm1Zo3H2ANByG4BYkOtoSuqYy8DMVD
Content-Type: multipart/mixed; boundary="ayvVL9NqUX3OUa2tYdjthz6baGVG8Uhen";
protected-headers="v1"
From: Jeffrey E Altman <jaltman@auristor.com>
To: "Daniel Mezhiborsky (daniel.mezhiborsky@cooper.edu)"
<daniel.mezhiborsky@cooper.edu>,
"openafs-info@openafs.org" <openafs-info@openafs.org>
Message-ID: <bef8aa6a-69d1-298d-1630-4030e05058ba@auristor.com>
Subject: Re: [OpenAFS] Migration and slow AFS performance
References: <MN2PR18MB2703B9173265F7FC0ECEEB74EC259@MN2PR18MB2703.namprd18.prod.outlook.com>
In-Reply-To: <MN2PR18MB2703B9173265F7FC0ECEEB74EC259@MN2PR18MB2703.namprd18.prod.outlook.com>
--ayvVL9NqUX3OUa2tYdjthz6baGVG8Uhen
Content-Type: multipart/mixed;
boundary="------------515129422F0232049908B7D3"
Content-Language: en-US
This is a multi-part message in MIME format.
--------------515129422F0232049908B7D3
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Hi Dan,
Since no one from the OpenAFS community has replied I will chime in.
On 5/25/2021 10:21 AM, Daniel Mezhiborsky=20
(daniel.mezhiborsky@cooper.edu) wrote:
> Hello all,
>=20
> We currently have a relatively small (~250 users, 2TB) AFS cell that I =
> am planning on retiring soon.=20
If you are willing to explain, I'm sure the community would appreciate=20
hearing the reasons behind the migration away from OpenAFS and how the=20
replacement was selected.
> I'd like to get our data out of AFS-space,=20
> but this is proving problematic because of performance issues. Our setu=
p=20
> is one AFS server VM with iSCSI-attached /vicepa.=20
It would be helpful if you could provide more details of the server=20
configuration.
1. Is the client-fileserver traffic sharing the same network adapter as=20
the iSCSI attached filesystem backing the vice partition?
2. How much network bandwidth is available to the client-fileserver path?=
3. How much network bandwidth is available to the fileserver-iSCSI path?
4. What is the network latency (Round Trip Time) between the client and=20
fileserver?
5. What version is the OpenAFS fileserver?
6. What command line options has the fileserver been started with?
7. What AFS client version and operating system (and OS version) is the=20
client?
8. What are the configuration options in use by the client?
9. What is the destination file system that the rsync client is writing t=
o?
10. Is the destination filesystem also being accessed via the network?
11. What is the OpenAFS fileserver VM configuration? number of cores,=20
amount of RAM, clock speed, etc.
> For single large-file=20
> transfers with encryption off, I can get around 30MB/s read/write, but =
> it seems like metadata operations are very slow, so copying our data=20
> (mostly home directories) directly from AFS with cp/rsync is taking a=20
> prohibitively long time.=20
Which metadata operations are slow?
Are you referring to operations reading the metadata from /afs or=20
writing it to the destination?
> I see similar slowness with vos dump. We do=20
> take regular backups with vos backupsys and backup dump that take about=
=20
> 36 hours for full backups.
>=20
> Does anyone have any recommendations on problem areas to try and tune t=
o=20
> get better performance or advice on a better way copy our data out of A=
FS?
You will need to provide more details about the environment before=20
specific recommendations can be provided. However, I will mention a few =
things to consider about the extraction methodology.
First, each new AFS RPC behaves similarly to a new TCP connection. It=20
starts with a minimal congestion window size and grows the window via a=20
slow start algorithm until either the maximum window size of 32 * 1412=20
packets has been reached or packet loss occurs. Combined with the RTT=20
for the link you can compute the bandwitdh delay product for a single=20
call.
In general RPCs issued by an AFS client to read from a fileserver are=20
going to be one disk cache chunk size at a time. Smaller if the average =
file size is less than the chunk size.
If authentication is required to access the directories and files, a=20
separate FetchStatus RPC will be issued for most files prior to the=20
first FetchData RPC. At least in an OpenAFS client.
One of the strengths of AFS is the caching client, but in this case=20
caching is not beneficial because the data will be read once and=20
discarded during this workflow. The workflow will also be painful for=20
an OpenAFS cache manager because of the callback tracking. If the=20
cache recycles before the callback expires, the cache manager will=20
cancel the outstanding callbacks.
Likewise, if the fileserver callback table fills then it will need to=20
expend considerable effort searching for unexpired callbacks to discard=20
early. Discarding callbacks requires issuing RPCs to the cache manager. =
So an insufficiently large cache and callback pool can result in the=20
callback lifecycle dominating the workflow.
Instead of performing one rsync at a time. You should consider=20
executing multiple rsyncs in parallel potentially from multiple client=20
systems.
Provide details of the environment and more specific recommendations can =
be provided.
Jeffrey Altman
AuriStor, Inc.
--------------515129422F0232049908B7D3
Content-Type: text/x-vcard; charset=utf-8;
name="jaltman.vcf"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
filename="jaltman.vcf"
begin:vcard
fn:Jeffrey Altman
n:Altman;Jeffrey
org:AuriStor, Inc.
adr:;;255 W 94TH ST STE 6B;New York;NY;10025-6985;United States
email;internet:jaltman@auristor.com
title:CEO
tel;work:+1-212-769-9018
url:https://www.linkedin.com/in/jeffreyaltman/
version:2.1
end:vcard
--------------515129422F0232049908B7D3--
--ayvVL9NqUX3OUa2tYdjthz6baGVG8Uhen--
--P1eFm1Zo3H2ANByG4BYkOtoSuqYy8DMVD
Content-Type: application/pgp-signature; name="OpenPGP_signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="OpenPGP_signature"
-----BEGIN PGP SIGNATURE-----
wsF5BAABCAAjFiEE+kRK8Zf0SbJM8+aZ93pzVZK2mgQFAmCzA6UFAwAAAAAACgkQ93pzVZK2mgSf
0RAApKOT6lKhgU9odh7CVQmKRSwuYm0Tmfrftj89mPKvNscEl4yINQXfCOZ3ud14/FD0dMhMOXWx
cvJzJjAHMcUc27gpGMIhuGXuJEukgC3X3noRIvDCm/bR01u35056V2SjocmUtGiCAGfXdkOWCWwl
xz58jBx43vbwpTwiooRSwDm9FE+C76No623s7dS+Ty+3sRUj1PzG3P+BSw1JKFzB7iVuWJPPCebQ
l7Nou9a95Xp+kRqwf1XxGIv3rXHWYXSOeqFwUuU+koVbc0ccnyehrhKbB3jTrdG/dADC/EDqcpq1
6KrMW8DFyQIijcft13Fup7KqYU6nMVUS4JvBEaG+cep2aDKOmGnBJ7Bnsf5+/SpsVYXo3T8xRCPy
7yrNpH6QpwtnsJDNyJk5PquSkMk3UPUV9QhTsRIdPOS2jLzskbXeHdrvtuTY+CNXE4+jXFd8XNm1
QHQ35arFVIp0xBEcU7JSRPABtShbi/npDY6eGaXb9lSAaEtD+MDxJ0cPsprEb9LrSj981iE7WnUt
OkG1zItsdlS+ANVBbCvBe4WuK3lXaAmPTdBwxAq9xBP0Mq4VX697DODLnsiF9nxiOgYrw09AGzx8
MhxPZH9EiLVVHyl+S9rTsGpa0+dzbFr62PhhVaLeUF+xe76vRrT6TSSdLScq/X5r8vV7xBj1DX+5
yis=
=wzQ4
-----END PGP SIGNATURE-----
--P1eFm1Zo3H2ANByG4BYkOtoSuqYy8DMVD--