[OpenAFS-devel] how does fileserver read from disk?

Roland Kuhn rkuhn@e18.physik.tu-muenchen.de
Sat, 17 Sep 2005 10:58:58 +0200


--Apple-Mail-48-463966853
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed

Hi Chas!

On 16 Sep 2005, at 14:42, chas williams - CONTRACTOR wrote:

> In message <6E0E8B0D-4DAF-445C-959C-3E9B212EF35D@e18.physik.tu- 
> muenchen.de>,Roland Kuhn writes:
>
>> Why can't this be replaced by read(big segment)->buffer->sendmsg 
>> (small
>> segments). AFAIK readv() is implemented in terms of read() in the
>> kernel for almost all filesystems, so it should really only have the
>> effect of making the disk transfer more efficient. The msg headers
>> interspersed with the data have to come from userspace in any case,
>> right?
>>
>
> no reason you couldnt do this i suppose.  you would need twice the
> number of entries in the iovec though.  you would need a special  
> version
> of rx_AllocWritev() that only allocated packet headers and chops up a
> buffer you pass in.
>
> curious, i rewrote rx_FetchData() to read into a single buffer and  
> then
> memcpy() into the already allocated rx packets.  this had no impact on
> performance as far as i could tell (my typical test read was a 16k  
> read
> split across 12/13 rx packets).  the big problem with iovec is not  
> iovec
> really but rather than you only get 1k for each rx packet you process.
> it quite a bit of work to handle an rx packet.  (although if your  
> lower
> level disk driver didnt support scatter/gather you might seem some
> benefit from this).

I know already that 16k-reads are non-optimal ;-) What I meant was  
doing chunksize (1MB in my case) reads. But what I gather from this  
discussion is that this would really be some work as this read-ahead  
would have to be managed across several rx jumbograms, wouldn't it?

Ciao,
                     Roland

--
TU Muenchen, Physik-Department E18, James-Franck-Str. 85747 Garching
Telefon 089/289-12592; Telefax 089/289-12570
--
A mouse is a device used to point at
the xterm you want to type in.
Kim Alm on a.s.r.
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GS/CS/M/MU d-(++) s:+ a-> C+++ UL++++ P-(+) L+++ E(+) W+ !N K- w--- M 
+ !V Y+
PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++>++++ h---- y+++
------END GEEK CODE BLOCK------



--Apple-Mail-48-463966853
content-type: application/pgp-signature; x-mac-type=70674453;
	name=PGP.sig
content-description: This is a digitally signed message part
content-disposition: inline; filename=PGP.sig
content-transfer-encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (Darwin)

iD8DBQFDK9rWI4MWO8QIRP0RAukxAKCNHoYtSdCHLc90QL/Vfqpxf2UoZQCgljBA
rLGmd0duv0LKyytTP2PrFII=
=rGYq
-----END PGP SIGNATURE-----

--Apple-Mail-48-463966853--