[OpenAFS-devel] Re: Backups using commercial products

Lyle Seaman lws@spinnakernet.com
Sun, 07 Jan 2001 10:51:38 -0500


Sam Hartman wrote:

> >>>>> "Lyle" == Lyle Seaman <lws@spinnakernet.com> writes:
>
>     >> So, does RX allow a procedure to write bytes into the queue
>     >> past the point where the window is full?  I.E. does it have a
>     >> concept similar to tcp|udp sendspace?  Shouldn't that smooth
>     >> out some of the disk seek delays?
>
>     Lyle> No it doesn't have any such thing, nor an equivalent to
>     Lyle> sendfile().  Those sorts of things would help, but why
>     Lyle> reinvent them?
>
> Because fixing RX may be easier than doing something else?
> OTOH, BEEP exports approximately the right interface to replace the RX stream layer.

The debate is both "what is easier now", and "what will continue
to be easier in the future."  But then, that varies from person to
person.  It's my opinion, based on my past experience with
RX and other transport/session layer protocols, that
writing protocols, and making them efficient, is harder than
it seems at first glance.

OTOH, since I'm just an "interested bystander" at this point,
it's probably not really any of my business.

>     Lyle> seeks is to batch up more I/O at one time.
>
> Can you give evidence to support this?  It seems that even if the
> seeks are slow, if you always have data ready to send immediately when
> the client sends you an ack, then you may use significantly more CPU
> than you need because of overhead, but you will still utilize network
> effectively.
>
> It seems that you could do a lot by using async IO and by adding such
> a facility to RX to continue to do work past the window.

Evidence?  Nah, this is orthodoxy, I've long since forgotten the actual
evidence.  (And therein lies the trap of othordoxy, to be sure.  )

Naturally, if you always have data ready to send immediately when the
window is extended then you will achieve maximal network throughput.

But you have to be doing a pretty good job of scheduling I/O in order
to pull 12 MB/s randomly off a single disk with a single thread.  It's no
trouble for a single large file, but when you have to snarf up small files,
directories, symlinks, and to top it off, do timestamp comparisons on
them and potentially skip them entirely...

In other words, if you are getting an ack every millisecond or two,
but an individual seek takes 9 milliseconds to complete, you've got
no chance of having data ready, unless you're doing enough disk
I/O to get your average time down to around a millisecond.

Then imagine that you're not sending the data across a LAN, but are
just dumping the volume locally.  You can't do it at top speed by just
looking ahead one file at a time, you have to be able to reorder your
workload based on "what is easiest to do right know," and only the
disks know that.  So you've got to be doing at least, oh, six I/Os at
a time, per spindle.

async I/O  and a "sendfile" sort of functionality would help, no doubt.
And I don't think that "you may use more CPU because of overhead"
will be an issue, as I believe the existing CPU costs are mostly in
RX per-packet processing, just as they always have been.

Go ahead, try it.  It will be interesting and informative, at least, and
you may well be right.

Cheers, and good luck.

PS.  I've spend more time on info-afs discussion than I really
can afford right now, so I'm going to go back to deep lurking.
I've got to get the electricity on in my kitchen before my
father-in-law gets here on Friday and sees what squalor I'm
keeping his daughter in.  Feel free to send me email directly if
you want to know *why* some stupid thing was done, but
otherwise I'm going to be lying low.