[OpenAFS] pre-fetch cache

Horst Birthelmer horst@riback.net
Tue, 19 Jul 2005 15:54:55 +0200

On Jul 19, 2005, at 2:58 PM, Brian May wrote:

>>>>>> "Jeffrey" == Jeffrey Altman <jaltman@columbia.edu> writes:
>>> What would be nice is a special volume type that is a
>>> combination of the cache and replicated read-only volumes.  If
>>> the master copy of any file in this volume changes, then all
>>> replicated copies of this file are deleted, and the next
>>> request is forwarded to the master server. Write requests are
>>> forwarded to the master server. That way multiple clients at
>>> one site can "share" the one cache copy.
>     Jeffrey> I don't understand how this would help share a cache.
>     Jeffrey> The cache data would still have to be read over the
>     Jeffrey> network.
> The replicated copy could be installed on a fast LAN when the master
> server is on a slower WAN.
> This would help in the situation where a number of clients at one site
> need to access data from a remote site.

If the cell design is made that way, yes. But don't blame that on AFS.
AFS chose by design "caching is done on the client side". That's  
about it ...

> I need to talk to my client more about their requirements - I suspect
> in this case it would be better if the servers were distributed, not
> centralized as was requested.
> Still, I think the above would be a good feature for applications that
> require distribution of large files across a large number of clients
> across a small number of sites.
> Caching on the clients is good, but it requires repeatedly downloading
> each file on each computer at a site, which could be expensive if it
> has to be downloaded from a WAN connection.

Do I get you right??
You're trying something like the "every cache manager is a server in  
some way" thought?
Don't think further. That's not what AFS was ever designed for.

> Examples of large files could include video files, CAD designs, etc.
> This is simple if files are only changed at one site (use rsync + http
> mirrors for example), it becomes more complicated if files can change
> at any site.

AFS has support for the first of you ideas. The second one is not  
that easy to solve. I don't know of anybody ever achieve to have a  
stable solution for that. I don't think you want that, if you think  
of what that means to your server to server traffic.
You talked before about servers on WANs which means you don't have  
that much speed there.
If you define site as a client they already can ... :-) but I'm sure,  
you already got that.

>     Jeffrey> Keep in mind that the Windows cache is now persistent.
>     Jeffrey> You can cache up to 1.3GB on the client.  So you only
>     Jeffrey> need to download the data once.  If you are planning on
>     Jeffrey> reading the same data repeatedly across numerous reboots
>     Jeffrey> you might not need to precache as much as you think.
> The argument was "we don't want to load the network during peak
> periods". I suspect not much thought was put into this.

What is "this" here?

> Am I correct in my understanding of AFS that any file writes require
> the entire file be copied back to the master server as soon as the
> program calls close() on that file? I think my client forgot to
> consider this.
> The client somehow wanted read/write access, guaranteed access to
> latest version, file locking when a file is opened (anywhere), and no
> files to be transferred except at night.

That sounds a lot like the "disconnected" idea to me. Which is  
thought of for a long time but not ready yet.

> Unfortunately, some of these conflicting requirements are simply not
> possible without adding time travel capabilities (<grin>) to AFS. I
> think time travel is beyond the budget of my client...

Maybe if you're able to be more specific, we can talk more than just  
ideas and give more than just "maybe" and "definitely not" answers.  
If your customer has that much of a specific plan about requirements  
of the system. Maybe they need a customized distributed file system. :-)