[AFS3-std] rpc refresh: FetchDirectory: discussion only

Jeffrey Hutzelman jhutz@cmu.edu
Thu, 10 Sep 2009 19:22:49 -0400


--On Thursday, September 10, 2009 07:00:34 PM -0400 "Matt W. Benjamin" 
<matt@linuxbox.com> wrote:

> 5. directory entries cannot be looked up, as server doesn't know
> applicable normalization rules

I'm not sure I'd go quite that far (and I'm pretty sure I was the one who 
made this argument).  I expect a lookup operation will _usually_ not be as 
useful as doing the lookup on the client, for this reason and also due to 
performance concerns.  But occasionally it might, so I wouldn't completely 
rule out the option of a lookup operation.


> /* Max length of one segment of a directory listing,
>  * which may be arbitrarily large */
>
> const AFSDIRSEQMAX = 512;
> typedef AFSDirEntry AFSDirEntrySeq<AFSDIRSEQMAX>;

I don't think I'd limit the number of entries that can be returned in this 
way.  It's not necessary -- the client can/should be able to indicate how 
many entries it wants, and should be able to read the whole directory at 
once if that's what it wants.

> proc FetchDirectory(
>      IN AFSFid *DirFid,
>      afs_uint32 Offset,
>      OUT afs_uint64 NEntries,

I think you mean for this to be an IN parameter, indicating how many 
entries the client wants.  You certainly don't need an OUT length, unless 
it's the _total_ number of entries in the directory.  Vector types like 
AFSDirEntrySeq above include a count.  However...

>      AFSDirEntrySeq *Entries,

This really shouldn't be an OUT parameter as a vector.  Despite my comments 
in Jabber about clients not tying up a server thread while they think about 
how to process a directory, this should still be a split RPC.  The reason 
for this is that it allows the server to compute and write the directory 
entries directly onto the call stream, and the client to read them directly 
off of the call stream, rather than requiring the XDR layer to allocate 
large amounts of memory.  With your proposed protocol, a fileserver 
returning a single page of directory entries could potentially have to 
allocate 512KB of memory for the AFSDirEntrySeq return value.

Also note that you _do_ want clients to be able to read the whole directory 
in a single RPC, if they can deal with the data reasonably quickly, because 
doing so allows streaming effects you don't get with a sequence of RPC's 
each returning a smaller number of entries.  There is a fairly complicated 
performance tradeoff to be made here, depending on what the client is 
trying to do, and it seems best to let the client decide how much of a 
directory it wants at once, just as it does for file data.


>      AFSFetchStatus *OutStatus,
>      AFSCallBack *CallBack,
>      AFSVolSync *Sync
> } = AFS_FETCH_DIRECTORY_ORD;


-- Jeff