[OpenAFS-devel] how does fileserver read from disk?

Wed, 21 Sep 2005 17:36:18 -0400

On 9/20/05, Marcus Watts <mdw@umich.edu> wrote:
> The fileserver doesn't know enough to help with read-ahead.  All you

I suggest you go learn how autonomic and heuristic algorithms work
before making such strong claims.  Fundamentally, each level in the
software stack closer to the application will have more information
regarding i/o patterns than the levels below it.  And, more
importantly, each higher level is more capable of putting that
workflow information into the context of the application code.  Take a
look at what the fileserver knows that the kernel doesn't: the
contents of the RPC arguments for all pending and previous
transactions (well, within reason due to memory constraints).  There's
a tremendous wealth of data regarding currently in-flight calls, and
statistical data regarding old calls.  The fileserver is in a much
better position than the kernel to make predictions regarding future
i/o patterns.  This would be a very interesting area for future
research.

Have you ever taken a close look at FetchData_RXStyle in
afsfileprocs.c?  Did you notice how it calls readv(); rx_Writev() in a
tight loop?  Did it ever occur to you that this is horribly
inefficient?  What is the kernel disk io scheduler doing when we're
busy in a sendmsg() syscall?  If it actually bothers to do something,
it's performing some read-aheads based upon a heuristic algorithm.=20
But, who knows for sure?  After all, that heuristic algorithm is far
outside of our control.  That tight loop is the equivalent of building
a TCP implementation that doesn't support window sizes larger than 1
MTU!

> know there is that random N sized chunk requests come from various

Can you actually prove that the fetchdata and storedata workloads are
purely stochastic?  I don't have a big enough data set to do any good
pattern matching right now, but I'm willing to bet there are very
clear patterns in the data.  Just off the top of my head, there should
be fairly strong correlations between the partially ordered streams of
calls coming over any particular rx conn.  And, if you really want to
get into the autonomics and data correlation spaces, I'm sure we could
find other very interesting patterns.

> cache managers, sometimes sequentially.  The place where optimal
> read-ahead knowledge lives is on the client side in the user's
> application.  Good luck getting that knowledge.
>

No.  Optimal read-ahead knowledge does not exist, courtesy of the
halting problem.  Of course, the fileserver's kernel knows less than
the fileserver, which in turn knows less than the cache manager, and
the cache manager knows less than the userspace application.  Yeah, so
the global knowledge problem for distributed systems sucks, but that's
what autonomics and heuristics are for.  What is your point?

> >
> > The operating systems I deal with on a daily basis have entire kernel
> > subsystems dedicated to aio, aio-specific system calls, and posix
> > compliance libraries wrapping the syscalls. The days of aio being a
> > joke are over (well, except for sockets...aio support for sockets is
> > still a tad rough even on the better commercial unices).
>
> Bully for you.  We've got a slightly more cost sensitive environment,
> so we're busy retiring the last of our rapidly aging aix and solaris
> machines.
>

Solaris runs on x86 and amd64.

--
Tom Keiser
tkeiser@gmail.com