[OpenAFS-devel] how does fileserver read from disk?

Roland Kuhn rkuhn@e18.physik.tu-muenchen.de
Thu, 15 Sep 2005 14:17:23 +0200


--Apple-Mail-34-303072161
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed

Hi Harald!

On 15 Sep 2005, at 12:18, Harald Barth wrote:

>> There are some "workarounds" to this problem.  First, we could  
>> abandon
>> the current zero-copy semantics and just do very large reads and
>> writes to the disk, and then do memcpy's in userspace.  For fast
>> machines, this will almost certainly beat the current algorithm for
>> raw throughput.  But, it's certainly not what I'd call an elegant
>> solution.
>>
>
> Yes, the data would go diskIO->kernel->userspace->kernel->net.
> On the diskIO side it will be in big chunks. In the net side
> it will be in MTU or MTU*4 size chunks. Bad?
>
This I don't understand: right now we have readv(small segments)- 
 >buffer->sendmsg(small segments), where the term 'zero-copy'  
indicates that the buffer is somehow special. My question is: Why  
can't this be replaced by read(big segment)->buffer->sendmsg(small  
segments). AFAIK readv() is implemented in terms of read() in the  
kernel for almost all filesystems, so it should really only have the  
effect of making the disk transfer more efficient. The msg headers  
interspersed with the data have to come from userspace in any case,  
right?

>
>> Second, we could use iovecs for the extension headers.   
>> Unfortunately,
>> most OS's limit us to 16 iovecs, so this would cut our max jumbogram
>> size nearly in half.
>>
>
> What impact would that have? Measurements? Speculations? If half the
> jumbogram size does not kill us, it sounds like an alternative worth
> to test.
>
Well, testing is always a good idea. The problem is that while I have  
the hardware setup, I do not possess the openAFS internal knowledge  
to produce a patch.

Ciao,
                     Roland

--
TU Muenchen, Physik-Department E18, James-Franck-Str. 85747 Garching
Telefon 089/289-12592; Telefax 089/289-12570
--
A mouse is a device used to point at
the xterm you want to type in.
Kim Alm on a.s.r.
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GS/CS/M/MU d-(++) s:+ a-> C+++ UL++++ P-(+) L+++ E(+) W+ !N K- w--- M 
+ !V Y+
PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++>++++ h---- y+++
------END GEEK CODE BLOCK------



--Apple-Mail-34-303072161
content-type: application/pgp-signature; x-mac-type=70674453;
	name=PGP.sig
content-description: This is a digitally signed message part
content-disposition: inline; filename=PGP.sig
content-transfer-encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (Darwin)

iD8DBQFDKWZYI4MWO8QIRP0RAgZsAKCd852K1Na6D59BOnKCXCfe7tckkQCdFXDH
xlGsuFEaIb4Aodw5xVm46KI=
=2DJn
-----END PGP SIGNATURE-----

--Apple-Mail-34-303072161--