[OpenAFS-devel] how does fileserver read from disk?

Tom Keiser tkeiser@gmail.com
Wed, 14 Sep 2005 05:15:26 -0400


On 9/14/05, Roland Kuhn <rkuhn@e18.physik.tu-muenchen.de> wrote:
> Dear experts!
>=20
> Having just strace'd the fileserver (non-LWP, single-threaded) on
> Linux, I noticed that the data are read from disk using readv in
> packets of 1396bytes, 16kB per syscall. In the face of chunksize=3D1MB
> from the client side this does not seem terribly efficient to me, but
> of course I see the benefit of reading chunks which can readily be
> transferred. If my interpretation is wrong or this is an artifact of
> not using tviced, please say so (if possible with a short reference
> to the source), otherwise it would be nice to know why the fileserver
> cannot read(fd, buf, 1048576) as that would give at least one order
> of magnitude better performance from the RAID and (journalled)
> filesystem.
>=20

This is an artifact of the bad decisions that were made when
implemeting the rx jumbogram protocol many years ago.  Unfortunately,
jumbogram extension headers are interspersed between each data
continuation vector.  Thus, we need a separate system iovec for each
rx packet continuation buffer.  The end result is storedata_rxstyle
and fetchdata_rxstyle end up doing two vector io syscalls
(recvmsg+writev or readv+sendmsg) per ~16kb of data.  The jumbogram
protocol needs to be replaced.

--=20
Tom Keiser
tkeiser@gmail.com