[OpenAFS] Fair bandwidth distribution, performance of OpenAFS on win32

Jeffrey Altman jaltman@secure-endpoints.com
Thu, 18 Nov 2010 12:21:36 -0500


This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigFF625518256FEF45F34FB8F6
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 11/12/2010 11:02 AM, Matthias Gerstner wrote:
> Hello!
>=20
>> For starters:
>=20
> Sorry for missing that part. So let's fill out the starter questions.
>=20
>> * what version are the servers?
>=20
> I'm citing from my other reply:
>=20
> * The server is running Gentoo Linux with kernel 2.6.35 at the time
>   being. OpenAFS server is at version 1.4.12.1. Most of the Linux clien=
t
>   machines run on the same Linux and OpenAFS version.
>=20
>> * what configuration parameters are the servers running with?
>=20
> Default ones as it turned out.

Fix the server configuration to use the maximum threads, allocate
additional callback, etc.

/usr/afs/bin/fileserver -L -udpsize 131071 -sendsize 131071 -rxpck 700
-p 128 -b 600 -nojumbo -cb 1500000

>> * are the users experiencing the delays on the same machine or
>> another?
>=20
> The delays come up on any client machines.

Delays that are platform independent are most likely due to file server
configuration.

>> * what versions are the clients involved?
>=20
> * Some of the Linux clients also run on custom installations using
>   Ubuntu, Debian or SuSE Linux at versions I don't really know at the
>   moment.
> * The Windows clients are Windows7 64-bit installations running in my
>   case OpenAFS 64-bit version 1.5.74
>=20
>> * how many afs file system objects are active in a ten minute window
>> when the slow down occurs?
>=20
> I have no idea. How can I determine that number? Via one of the
> monitoring tools?

xstat_fs_test <server> 3 -onceonly

can be used to determine how many callbacks are issues and how many
times GetSomeSpace (GSS) had to be called to try to free up callbacks in
order to satisfy new requests.

clients behind NATs or firewall devices that have very short UDP port
timeouts can also hamper performance severely.


>> Are you using the install out of the box?  96MB cache.  What work load=

>> are you giving it?
>=20
> I've tweaked the cache parameters a bit. 512 MB of cache. 256 kb
> chunksize. 5.000 status entries.

You are permitting each client to request 5000 callbacks from the file
servers.  The file servers should be configured for a value that is
large enough to support those potential requests.
>=20
> I myself for example are reading source files from AFS, outputting
> compilation results onto local hard disk. I.e. no write access on AFS
> occurs.  The performance changes over time from bad to ugly and vice
> versa. Due to reasons I've not been able to figure out yet (as so many
> things on that OS as I might say...).

The focus should be on the client to file server communication.

>=20
> Anyway the performance is like I'm waiting sometimes (or rather: most o=
f
> the time) 5 to 10 seconds for a single object file to be compiled (as
> opposed to about a second for the same thing done on a Linux client).

On Windows a file open will require a FetchStatus RPC, a LockFile RPC,
and an UnlockFile RPC depending on the open mode being used.  Then RPCs
to process the file data.  If the directory needs to be constructed
locally, the directory FetchStatus occurs, followed by reading the
entire directory contents in bulk.

The RTT to the servers and the server response times are critical.

> The problem is not specific to that Windows machine. Colleagues of mine=

> attempted the same on their Windows machines (same versions) and it
> turned out similarly.
>=20
> From everyday work we now that compiling the software completely from/t=
o
> the local harddisk on Windows also takes up at least two times longer
> than on a Linux system. But the results when involving AFS is beyond
> usability.  Thing is that reading or writing a large chunk of data via
> AFS on the machine yields good results (I got about 30, 40 MB/s here on=

> gigabit ethernet). So I guess the slow part comes in when dynamically
> accessing smaller files on AFS.

As described above, there are many RPCs that need to be processed to
ensure Windows file semantics are preserved.   If there are delays due
to callback allocation or too few threads to support in the incoming
calls, this will show up in delays on the client.

You indicated that the version of OpenAFS for Windows in use is 1.5.74.
Separate from any other reason, you want to upgrade to a later version
in order to obtain compatibility with Windows Update MS10-020 which was
pushed out in April.   Applications can crash if MS10-020 is installed
and OpenAFS has not been updated.

Jeffrey Altman



--------------enigFF625518256FEF45F34FB8F6
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)

iQEcBAEBAgAGBQJM5WCpAAoJENxm1CNJffh4zWAIAMzcbrKsqNc2zdKYA1V6PMyA
xXKMfTKBGr/haQ7hwjfwZdMXK2vpsu5Qry7X+UKrQKx5oRoSIx7unWhdi7oP51Kq
8uUGRqCbXsBRe2gnsxf12XmiL059eX8xIp2W9sdUVRCHUj5pewKbgtrKSOP1EIps
PUfyqMbiO+J/2umIfiRA88umGjbEO3AnqG+CW6yDl1zq9Ly4biSl2lI7ZUnzYEYd
7tjf0pu5e5Raikx9GmWsTxAyviu+O6F+aaaiQGSMzGrxLx4JbghfJD9AMI8lgrHq
11oGZIpvGY61V3cFIIOKg9MPzkjFBrU+eykVAwdst9+izV36TGut+Zp5LGf/LI0=
=u6Bu
-----END PGP SIGNATURE-----

--------------enigFF625518256FEF45F34FB8F6--