[OpenAFS-devel] Re: openafs / opendfs collaboration

Tom Keiser Tom Keiser <tkeiser@gmail.com>
Fri, 21 Jan 2005 16:56:16 -0500


Ivan,

On Fri, 21 Jan 2005 10:27:33 +0100, Ivan Popov <pin@medic.chalmers.se> wrote:
> Hi Tom!
> 
> On Tue, Jan 18, 2005 at 04:46:14PM -0500, Tom Keiser wrote:
> > Secondly, I know this is a rather drastic proposal, but is it time to
> > consider splitting the cache manager out of individual filesystem clients?
> 
> What do you call a filesystem client and a cache manager in this context?
> 

I'm (roughly) thinking of clients such as OpenAFS and OpenDFS as
several interacting components:

cache manager:
- responsible for storage of data and metadata
- responsible for cache replacement strategy
- API for interaction with implementation of VFS interface
- API for access by RPC endpoint for things like cache invalidation

credential manager:
- example would be the linux 2.6 keyring implementation

implementation of VFS interface:
- very os-specific stuff
- probably in-kernel unless something like LUFS takes off

RPC endpoint:
- listener for cache invalidations, etc.

RPC client library:
- client stub library

fs-specific syscall:
- PAG management, etc.

This is still an oversimplified view (where to put things like fsprobes?).


> I am afraid that different people (including myself) may think about
> very different things.
> 
> > If the interfaces are abstract enough, we should be able to have multiple
> > distributed fs's using the same cache manager API.
> 
> Do you mean any besides AFS and DFS?
> 

These two are the most obvious.  It's less clear whether other
filesystems would actually benefit from a cache manager complex enough
to handle AFS and DFS.  It comes down to whether more lightweight
filesystems would benefit from a cache manager that sacrifices some
performance for caching aggressiveness.  However, there's nothing to
preclude use of a pluggable algorithm or tunables to set what tradeoff
is desired.

> > help reduce the amount of in-kernel code for which each
> > project is responsible.  Anyone else think this is feasible?
> 
> Do you mean in-kernel cache management? Then probably no.
> Both filesystems and kernels are of great variety.
> 

This is an argument best left for another day.  Suffice it to say, I
don't think supporting M in-kernel filesystems on N os's is a
sustainable model.  The less we depend on the subtle nuances of each
kernel's API, the better our chances of survival.

> If you mean a more general "cache bookkeeping library", then possibly yes,
> but still you'll get differences depending on how FSs and OSs distribute
> functionality between kernel and user space in a filesystem client.
> 

This is what I was proposing in my initial post.  Distributed
filesystems can benefit from an in-memory cache, but a larger cache
that survives reboots is often more appealing.  Unfortunately,
utilizing os-specific cache tools is just going to increase autoconf
complexity, and produce even more ifdef soup.  FS's like AFS and DFS
are so complex that we must have a common client codebase across
platforms.  So, a cross-platform cache library that uses something
like the osi api for interaction with the rest of the kernel sounds
more feasible.  I don't see the one-OS vision of many linux supporters
becoming a reality for several more years.  So, instead I'm advocating
something that sacrifices performance for OS agnosticism (sounds a bit
like the ARLA philosophy...).

> If you mean the upcall interface (a common kernel module for different
> filesystems), then probably no - it reflects both the corresponding filesystem
> semantics and the corresponding kernel architecture...
> 

I agree that the upcall interface will probably never be common.  The
only way we could ever get there is the emergence of a
high-performance, cross-platform userspace filesystem API.  Then maybe
we wouldn't feel compelled to put everything but the kitchen sink in
kernel-space ;)

> Though, less demanding filesystems can be happy with "foreign" kernel
> modules - like podfuk-smb or davfs2 using the Coda module.
> 

While I was not trying to advocate a userspace implementation, I don't
think such an option should be ignored.  But, I'm one of the last few
hold-outs who like the elegance of the microkernel architecture. 
Crossing the kernelspace/userspace boundary can be optimized.  If you
want speed and parallelism, the userspace/kernelspace boundary could
be crossed using something like asynchronous message queues.  Granted,
there's not much reason for hope right now, but it sure would make
everyone's lives easier if a good userspace filesystem driver API
existed on multiple platforms.  Yes, it will always be slower than
running in-kernel, but the reduction in maintenance to keep up with
rapidly changing kernel APIs should free up more people's time to work
on a better cache manager.  Not to mention, debugging and profiling
userspace code is soooo much easier.

Regards,

-- 
Tom