[OpenAFS-devel] how does fileserver read from disk?

Roland Kuhn rkuhn@e18.physik.tu-muenchen.de
Wed, 14 Sep 2005 15:24:37 +0200


--Apple-Mail-24-220705887
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed

Hi Tom!

On 14 Sep 2005, at 11:15, Tom Keiser wrote:

> On 9/14/05, Roland Kuhn <rkuhn@e18.physik.tu-muenchen.de> wrote:
>
>> Dear experts!
>>
>> Having just strace'd the fileserver (non-LWP, single-threaded) on
>> Linux, I noticed that the data are read from disk using readv in
>> packets of 1396bytes, 16kB per syscall. In the face of chunksize=1MB
>> from the client side this does not seem terribly efficient to me, but
>> of course I see the benefit of reading chunks which can readily be
>> transferred. If my interpretation is wrong or this is an artifact of
>> not using tviced, please say so (if possible with a short reference
>> to the source), otherwise it would be nice to know why the fileserver
>> cannot read(fd, buf, 1048576) as that would give at least one order
>> of magnitude better performance from the RAID and (journalled)
>> filesystem.
>>
>>
>
> This is an artifact of the bad decisions that were made when
> implemeting the rx jumbogram protocol many years ago.  Unfortunately,
> jumbogram extension headers are interspersed between each data
> continuation vector.  Thus, we need a separate system iovec for each
> rx packet continuation buffer.  The end result is storedata_rxstyle
> and fetchdata_rxstyle end up doing two vector io syscalls
> (recvmsg+writev or readv+sendmsg) per ~16kb of data.  The jumbogram
> protocol needs to be replaced.

Thanks for the explanation. Wouldn't it be possible to keep the  
network protocol (including the sendmsg) as it is, but still to read  
bigger chunks? The outgoing messages are constructed using iovecs  
anyway, so why not intersperse the extension headers at sendmsg time?

Ciao,
                     Roland

--
TU Muenchen, Physik-Department E18, James-Franck-Str. 85747 Garching
Telefon 089/289-12592; Telefax 089/289-12570
--
A mouse is a device used to point at
the xterm you want to type in.
Kim Alm on a.s.r.
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GS/CS/M/MU d-(++) s:+ a-> C+++ UL++++ P-(+) L+++ E(+) W+ !N K- w--- M 
+ !V Y+
PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++>++++ h---- y+++
------END GEEK CODE BLOCK------



--Apple-Mail-24-220705887
content-type: application/pgp-signature; x-mac-type=70674453;
	name=PGP.sig
content-description: This is a digitally signed message part
content-disposition: inline; filename=PGP.sig
content-transfer-encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (Darwin)

iD8DBQFDKCSZI4MWO8QIRP0RAoXEAKC5ImSvE4CqTEgF5xeVcm7SRFUY0gCgn8Ao
BhVXXbwg63DzyP5aehi/y6c=
=O03Z
-----END PGP SIGNATURE-----

--Apple-Mail-24-220705887--