[AFS3-std] rpc refresh: FetchDirectory: discussion only

Sat, 12 Sep 2009 13:44:15 -0400

--On Saturday, September 12, 2009 02:05:36 AM -0400 Tom Keiser 
<tkeiser@sinenomine.net> wrote:

> Let me tackle 4 and 6 together, as I think they're intertwined.  Both
> of these points seem to implicitly assume the current fileserver
> implementation.  I can easily envision a future fileserver that stores
> the directory object as, say, a B-tree.  In that case, returning a
> sorted subset is rather trivial, and addressing a subset by an ordinal
> rather awkward, as you're effectively trying to linearize the
> structure by something other than its natural sort order.

Oh, the fileserver could return things in a sorted order.  But it's not 
guaranteed to return them in any _particular_ order.  It's not necessary or 
useful to do so, because the operating system API's for reading directories 
do not return them in any particular order, nor is it possible in the 
general case, because the fileserver does not know the user's sorting 
rules.  The data structure may have a single "natural" sort order, but the 
filenames do not -- that depends on a variety of factors including in what 
language files are named (and that assuming everything in one directory is 
named in one language).

> Furthermore, the proposed XDR proc requests a subset by an ephemeral
> ordinal without specifying the DV which was used to come up with said
> ordinal.

Yeah, for paging to actually get you a coherent snapshot of a directory, 
there needs to be agreement on what DV is involved.  However, that doesn't 
have to mean an IN parameter.  there's no reason to assume the fileserver 
will ever have multiple versions available, and the proposed RPC already 
includes an output AFSFetchStatus, which includes the DV of the object.  As 
long as the DV doesn't change, the client can keep fetching directory 
components.  If it does change, there's going to be a callback and they'll 
be invalidated anyway.

> I think the proc needs more fields.  For example, I'd like an OUT
> parameter for the server to communicate whether the data is sorted or
> unsorted

I don't see how that's useful.  Knowing the data is sorted is useless 
unless you know it was sorted according to the rules you care about, and 
representing that is too complex.  Besides, even if it gets sorted results, 
what is a client going to do with them, other than enter them into some 
local data structure or else return them directly via an interface that 
doesn't care whether the output is sorted.

> As far as the addressing by ordinal issue, my preference would be for
> the "primary key" to be a discriminated union.  For the current
> generation of file servers, an ordinal makes perfect sense.  However,
> I think we should specify a second union entry which is a filename
> string.  Upon receipt, file servers supporting this lookup mechanism
> would return the block of entries which follow said entry.  This
> would, of course, require allocation of new error codes and
> capabilities to announce which lookup mechanism(s) the server
> supported.

I could support an interface in which the client indicates the filename of 
the last entry it got from the server, rather than a numerical index. 
Probably the ideal situation is for the server to simply return a cookie 
which can be used in the next call to pick up at the same place.  The point 
here is not to allow the client random access, but to allow it to split 
fetches across multiple RPC's to avoid tying up server threads during 
complex processing of large directories.

-- Jeff