[OpenAFS-devel] Regarding GSoC 2010 Collaborative Caching Project
shruti jain
shruti.jain1988@gmail.com
Tue, 20 Apr 2010 23:50:56 +0530
--001485e8e67c7c7beb0484af234d
Content-Type: text/plain; charset=ISO-8859-1
Hi,
This looks good. I have understood what you intend to achieve in this
project. Thanks for the clarifications.
Shruti
On Sat, Apr 17, 2010 at 7:31 PM, Jeffrey Altman <
jaltman@secure-endpoints.com> wrote:
> On 4/17/2010 2:25 AM, shruti jain wrote:
> > Here is what I know about the cache manager and its file server
> > interactions.
> > The Cache Manager// resides on the client side in openAFS environment
> > and communicates with AFS file server on behalf of the application
> > programs running on the client. When an AFS file is needed by any
> > application program running on a client machine, the request is sent to
> > the Cache Manager which in turn issues RPC calls to the file server
> > storing the requested file.
>
> This is true for any object (file, directory, mount point, symlink, ...)
>
> AFS supports readonly replicas. The CM is permitted to request copies
> of the data from any of the replicas although at present, the CM only
> reads from a single replica at a time.
>
> >// When the Cache Manager receives the
> > requested data from the file Server, it stores it in the cache and also
> > delivers it to the application program which had initially requested for
> > the data. In order to maintain cache consistency, server issues a
> > callback along with the data. A callback is a promise by a File Server
> > to a Cache Manager to inform any change in the data delivered by the
> > File Server to the Cache Manager. If any other client on the network
> > modifies the file then the file server breaks this callback and thus
> > gives an indication to the Cache manager that its locally cached copy of
> > the file is obsolete and needs to be updated.The callback mechanism
> > ensures that the Cache Manager always requests the most up-to-date
> > version of a file. In this way, cache manager also performs the
> > responsibility of maintaining the cache.
>
> You have the general idea. Let me provide a few additional details. In
> the original (and currently deployed) implementation of callbacks, a
> callback is a promise that the FS will notify the CM of a change for up
> to S seconds with values for read/write data typically measured in
> minutes and for read-only data typically measured in hours. The number
> of callback promises (or registrations) that a FS can maintain is
> finite. Callback registrations can therefore be canceled prematurely
> without there being a change.
>
> The callback notification (or invalidation) is delivered via an
> unauthenticated RPC channel. As a result, the notification cannot be
> trusted by the CM and must be treated as meaning "a change might have
> occurred, please verify if it matters".
>
> The existing callback notification does not provide any hint as to the
> type of change that might have occurred. Callback notifications are
> issued for many reasons including:
>
> . the data changed
> . the access control list changed
> . other metadata changed
> . the locking state changed
> . the volume in which the data is located is being replicated
> (aka released)
> . the object has been deleted
> . the FS ran out of room in the registration table
>
> Once a notification is issued, the registration is broken and the
> CM will receive no further notifications until it requests updated
> status for the object in question.
>
> The CM determines what has changed by issuing a FetchStatus RPC to
> the FS and comparing the prior and current status fields.
>
> Matt Benjamin has developed and implemented (but its not shipping yet)
> an extended version of callback notifications that provide the CM with
> additional details regarding the change. When combined with an
> authenticated callback channel this becomes a very powerful combination.
>
> It is also important to discuss how the FS and CM track object data.
> Each time a change to the data (not the metadata) occurs, a data version
> (DV) number for the object is incremented. When the CM issues a
> StoreData rpc, it is returned updated status info. If the DV was
> incremented by one, then the CM knows that there was no race with
> another CM and all of the data in the cache for that file is still
> current. If the DV increment was greater than one, then the CM knows
> that the data it just wrote is current, but all other data is suspect.
>
> When using the Extended Callback mechanism, the FS can issue a
> notification that a StoreData occurred affecting {FileID, offset,
> length} and the current DV is N without canceling the callback
> registration. This permits the CM to maintain the cache coherency at a
> lower cost of network traffic when an object is actively being used.
>
> However, when a CM starts or when an object has been idle for more than
> a few minutes, there will be no callback registration. In that
> situation, a change could have occurred to the file data and the CM will
> be forced to discard all of the cached data if a change did occur.
> Unfortunately, there is no mechanism at present for the CM to ask the FS
> "I need the chunk of data represented by {FileID, offset, length} but I
> currently have data in that range with the following hash value. Could
> you confirm that my data is current or send me the correct data?"
>
> I have been considering a proposal to implement such an RPC,
> RXAFS_FetchDataWithHash(FID, offset, length, hash). With such an RPC in
> place, the CM can verify the contents of the cache and avoid large
> amounts of unnecessary traffic.
>
> I am raising this idea here because I believe it is very applicable to
> your project. The trust model in AFS is between the CM and the FS.
> There is no trust between CMs. As a result, if a CM obtains data from
> another CM, it needs a low cost mechanism to validate it against the FS.
>
> > So in this project, we need to modify the cache manager to enable
> > interactions with other clients as well.
> > In the first part of the project, where the cache manager contacts a
> > fixed set of remote clients, it retrieves the file from any of these
> > clients if their callback of the file is not broken. Since the callback
> > is not broken, it is an indication that the file present on this remote
> > client is most recent. In case no client has most recent copy of the
> > file, we can contact the file server to retrieve the data.
>
> That is one approach but not the one I would take. If the cost of
> reading the data from a local CM is so much cheaper than reading it from
> the FS, the CM can read the data from the other CM (or at least get its
> hash) and then verify it with the file server.
>
> In most file operations, the entire file is not re-written. Just
> portions of it are and in the case of "append only files" such as log
> files, the data never changes after it is written. Re-fetching this
> data from the FS every time the DV changes is extremely wasteful. It is
> much better to obtain it in the cheapest mechanism possible and then
> verify it via a trusted means.
>
> > In the second part of the project, we can allow discovery of peer
> > clients for collaboration. This can be done by modifying the file server
> > to keep access logs of the clients and if a client requests for any data
> > then its corresponding clients in the logs would be returned to the
> > requesting clients. In order to maintain cache consistency, the
> > requesting client also establishes a callback guarantee from the file
> > server so that it knows of the modifications in the file irrespective of
> > where it has got the file from.
>
> I would leave the FS out of the peer collaboration and instead permit
> CMs that wish to offer data to do so via Bonjour.
>
> >
> > I have seen the files afs_callback.c, cbqueue.c, dcache.c and server.c
> > and think that these are some of the programs used in cache manager and
> > server-cache manager interactions. Please correct me if I am wrong.
>
> In terms of how I would like to see this project structured. Before any
> collaboration is implemented I would like to see a generic mechanism
> added to the CM to permit use of a second level cache. Then once than
> mechanism is in place, a plug-in to that framework can be implemented
> that supports obtaining data from the second level cache which happens
> to be peer CMs.
>
> The benefit of this approach is that the framework for the second level
> cache can be implemented and incorporated into a future openafs release
> without committing us to a particular implementation of the peer to peer
> protocols. Future research in peer to peer cache sharing can then take
> place at a much lower cost.
>
> Jeffrey Altman
>
>
>
>
--001485e8e67c7c7beb0484af234d
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Hi,<br><br>This looks good. I have understood what you intend to achieve in=
this project. Thanks for the clarifications.<br><br>Shruti<br><br><div cla=
ss=3D"gmail_quote">On Sat, Apr 17, 2010 at 7:31 PM, Jeffrey Altman <span di=
r=3D"ltr"><<a href=3D"mailto:jaltman@secure-endpoints.com">jaltman@secur=
e-endpoints.com</a>></span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class=3D"im"=
>On 4/17/2010 2:25 AM, shruti jain wrote:<br>
> Here is what I know about the cache manager and its file server<br>
> interactions.<br>
> The Cache Manager// resides on the client side in openAFS environment<=
br>
> and communicates with AFS file server on behalf of the application<br>
> programs running on the client. When an AFS file is needed by any<br>
> application program running on a client machine, the request is sent t=
o<br>
</div>> the Cache Manager which in turn issues RPC calls to the file ser=
ver<br>
> storing the requested file.<br>
<br>
This is true for any object (file, directory, mount point, symlink, ...)<br=
>
<br>
AFS supports readonly replicas. =A0The CM is permitted to request copies<br=
>
of the data from any of the replicas although at present, the CM only<br>
reads from a single replica at a time.<br>
<br>
>// When the Cache Manager receives the<br>
<div class=3D"im">> requested data from the file Server, it stores it in=
the cache and also<br>
> delivers it to the application program which had initially requested f=
or<br>
> the data. In order to maintain cache consistency, server issues a<br>
> callback along with the data. A callback is a promise by a File Server=
<br>
> to a Cache Manager to inform any change in the data delivered by the<b=
r>
> File Server to the Cache Manager. If any other client on the network<b=
r>
> modifies the file then the file server breaks this callback and thus<b=
r>
> gives an indication to the Cache manager that its locally cached copy =
of<br>
> the file is obsolete and needs to be updated.The callback mechanism<br=
>
> ensures that the Cache Manager always requests the most up-to-date<br>
> version of a file. In this way, cache manager also performs the<br>
</div>> responsibility of maintaining the cache.<br>
<br>
You have the general idea. =A0Let me provide a few additional details. =A0I=
n<br>
the original (and currently deployed) implementation of callbacks, a<br>
callback is a promise that the FS will notify the CM of a change for up<br>
to S seconds with values for read/write data typically measured in<br>
minutes and for read-only data typically measured in hours. =A0The number<b=
r>
of callback promises (or registrations) that a FS can maintain is<br>
finite. =A0Callback registrations can therefore be canceled prematurely<br>
without there being a change.<br>
<br>
The callback notification (or invalidation) is delivered via an<br>
unauthenticated RPC channel. =A0As a result, the notification cannot be<br>
trusted by the CM and must be treated as meaning "a change might have<=
br>
occurred, please verify if it matters".<br>
<br>
The existing callback notification does not provide any hint as to the<br>
type of change that might have occurred. =A0Callback notifications are<br>
issued for many reasons including:<br>
<br>
=A0. the data changed<br>
=A0. the access control list changed<br>
=A0. other metadata changed<br>
=A0. the locking state changed<br>
=A0. the volume in which the data is located is being replicated<br>
=A0 (aka released)<br>
=A0. the object has been deleted<br>
=A0. the FS ran out of room in the registration table<br>
<br>
Once a notification is issued, the registration is broken and the<br>
CM will receive no further notifications until it requests updated<br>
status for the object in question.<br>
<br>
The CM determines what has changed by issuing a FetchStatus RPC to<br>
the FS and comparing the prior and current status fields.<br>
<br>
Matt Benjamin has developed and implemented (but its not shipping yet)<br>
an extended version of callback notifications that provide the CM with<br>
additional details regarding the change. =A0When combined with an<br>
authenticated callback channel this becomes a very powerful combination.<br=
>
<br>
It is also important to discuss how the FS and CM track object data.<br>
Each time a change to the data (not the metadata) occurs, a data version<br=
>
(DV) number for the object is incremented. =A0When the CM issues a<br>
StoreData rpc, it is returned updated status info. =A0If the DV was<br>
incremented by one, then the CM knows that there was no race with<br>
another CM and all of the data in the cache for that file is still<br>
current. =A0If the DV increment was greater than one, then the CM knows<br>
that the data it just wrote is current, but all other data is suspect.<br>
<br>
When using the Extended Callback mechanism, the FS can issue a<br>
notification that a StoreData occurred affecting {FileID, offset,<br>
length} and the current DV is N without canceling the callback<br>
registration. =A0This permits the CM to maintain the cache coherency at a<b=
r>
lower cost of network traffic when an object is actively being used.<br>
<br>
However, when a CM starts or when an object has been idle for more than<br>
a few minutes, there will be no callback registration. =A0In that<br>
situation, a change could have occurred to the file data and the CM will<br=
>
be forced to discard all of the cached data if a change did occur.<br>
Unfortunately, there is no mechanism at present for the CM to ask the FS<br=
>
"I need the chunk of data represented by {FileID, offset, length} but =
I<br>
currently have data in that range with the following hash value. =A0Could<b=
r>
you confirm that my data is current or send me the correct data?"<br>
<br>
I have been considering a proposal to implement such an RPC,<br>
RXAFS_FetchDataWithHash(FID, offset, length, hash). =A0With such an RPC in<=
br>
place, the CM can verify the contents of the cache and avoid large<br>
amounts of unnecessary traffic.<br>
<br>
I am raising this idea here because I believe it is very applicable to<br>
your project. =A0The trust model in AFS is between the CM and the FS.<br>
There is no trust between CMs. =A0As a result, if a CM obtains data from<br=
>
another CM, it needs a low cost mechanism to validate it against the FS.<br=
>
<div class=3D"im"><br>
> So in this project, we need to modify the cache manager to enable<br>
> interactions with other clients as well.<br>
> In the first part of the project, where the cache manager contacts a<b=
r>
> fixed set of remote clients, it retrieves the file from any of these<b=
r>
> clients if their callback of the file is not broken. Since the callbac=
k<br>
> is not broken, it is an indication that the file present on this remot=
e<br>
> client is most recent. In case no client has most recent copy of the<b=
r>
> file, we can contact the file server to retrieve the data.<br>
<br>
</div>That is one approach but not the one I would take. =A0If the cost of<=
br>
reading the data from a local CM is so much cheaper than reading it from<br=
>
the FS, the CM can read the data from the other CM (or at least get its<br>
hash) and then verify it with the file server.<br>
<br>
In most file operations, the entire file is not re-written. =A0Just<br>
portions of it are and in the case of "append only files" such as=
log<br>
files, the data never changes after it is written. =A0Re-fetching this<br>
data from the FS every time the DV changes is extremely wasteful. =A0It is<=
br>
much better to obtain it in the cheapest mechanism possible and then<br>
verify it via a trusted means.<br>
<div class=3D"im"><br>
> In the second part of the project, we can allow discovery of peer<br>
> clients for collaboration. This can be done by modifying the file serv=
er<br>
> to keep access logs of the clients and if a client requests for any da=
ta<br>
> then its corresponding clients in the logs would be returned to the<br=
>
> requesting clients. In order to maintain cache consistency, the<br>
> requesting client also establishes a callback guarantee from the file<=
br>
> server so that it knows of the modifications in the file irrespective =
of<br>
> where it has got the file from.<br>
<br>
</div>I would leave the FS out of the peer collaboration and instead permit=
<br>
CMs that wish to offer data to do so via Bonjour.<br>
<div class=3D"im"><br>
><br>
> I have seen the files afs_callback.c, cbqueue.c, dcache.c and server.c=
<br>
> and think that these are some of the programs used in cache manager an=
d<br>
> server-cache manager interactions. Please correct me if I am wrong.<br=
>
<br>
</div>In terms of how I would like to see this project structured. =A0Befor=
e any<br>
collaboration is implemented I would like to see a generic mechanism<br>
added to the CM to permit use of a second level cache. =A0Then once than<br=
>
mechanism is in place, a plug-in to that framework can be implemented<br>
that supports obtaining data from the second level cache which happens<br>
to be peer CMs.<br>
<br>
The benefit of this approach is that the framework for the second level<br>
cache can be implemented and incorporated into a future openafs release<br>
without committing us to a particular implementation of the peer to peer<br=
>
protocols. =A0Future research in peer to peer cache sharing can then take<b=
r>
place at a much lower cost.<br>
<font color=3D"#888888"><br>
Jeffrey Altman<br>
<br>
<br>
<br>
</font></blockquote></div><br>
--001485e8e67c7c7beb0484af234d--