[AFS3-std] rpc refresh: FetchDirectory: discussion only
Jeffrey Hutzelman
jhutz@cmu.edu
Sat, 12 Sep 2009 13:44:15 -0400
--On Saturday, September 12, 2009 02:05:36 AM -0400 Tom Keiser
<tkeiser@sinenomine.net> wrote:
> Let me tackle 4 and 6 together, as I think they're intertwined. Both
> of these points seem to implicitly assume the current fileserver
> implementation. I can easily envision a future fileserver that stores
> the directory object as, say, a B-tree. In that case, returning a
> sorted subset is rather trivial, and addressing a subset by an ordinal
> rather awkward, as you're effectively trying to linearize the
> structure by something other than its natural sort order.
Oh, the fileserver could return things in a sorted order. But it's not
guaranteed to return them in any _particular_ order. It's not necessary or
useful to do so, because the operating system API's for reading directories
do not return them in any particular order, nor is it possible in the
general case, because the fileserver does not know the user's sorting
rules. The data structure may have a single "natural" sort order, but the
filenames do not -- that depends on a variety of factors including in what
language files are named (and that assuming everything in one directory is
named in one language).
> Furthermore, the proposed XDR proc requests a subset by an ephemeral
> ordinal without specifying the DV which was used to come up with said
> ordinal.
Yeah, for paging to actually get you a coherent snapshot of a directory,
there needs to be agreement on what DV is involved. However, that doesn't
have to mean an IN parameter. there's no reason to assume the fileserver
will ever have multiple versions available, and the proposed RPC already
includes an output AFSFetchStatus, which includes the DV of the object. As
long as the DV doesn't change, the client can keep fetching directory
components. If it does change, there's going to be a callback and they'll
be invalidated anyway.
> I think the proc needs more fields. For example, I'd like an OUT
> parameter for the server to communicate whether the data is sorted or
> unsorted
I don't see how that's useful. Knowing the data is sorted is useless
unless you know it was sorted according to the rules you care about, and
representing that is too complex. Besides, even if it gets sorted results,
what is a client going to do with them, other than enter them into some
local data structure or else return them directly via an interface that
doesn't care whether the output is sorted.
> As far as the addressing by ordinal issue, my preference would be for
> the "primary key" to be a discriminated union. For the current
> generation of file servers, an ordinal makes perfect sense. However,
> I think we should specify a second union entry which is a filename
> string. Upon receipt, file servers supporting this lookup mechanism
> would return the block of entries which follow said entry. This
> would, of course, require allocation of new error codes and
> capabilities to announce which lookup mechanism(s) the server
> supported.
I could support an interface in which the client indicates the filename of
the last entry it got from the server, rather than a numerical index.
Probably the ideal situation is for the server to simply return a cookie
which can be used in the next call to pick up at the same place. The point
here is not to allow the client random access, but to allow it to split
fetches across multiple RPC's to avoid tying up server threads during
complex processing of large directories.
-- Jeff