[OpenAFS-devel] Regarding GSoC 2010 Collaborative Caching Project

shruti jain shruti.jain1988@gmail.com
Tue, 13 Apr 2010 19:42:44 +0530


--005045015a5904526204841edbed
Content-Type: text/plain; charset=ISO-8859-1

Dear all,

I am a computer science student at IIIT Hyderabad, India. I am interested in
contributing to OpenAFS and have applied in GSoC 2010 for OpenAFS. I think
this would be good starting point for me to work with the community. I have
also participated in GSoC 2009 with Globus Alliance as my mentoring
organization. Also I am working on a research project at my university to
improve read access and execution performance for DFS.

I am interested in Collaborative Caching Project listed on the ideas page.
The project proposal I have submitted is as follows:

The project aims at developing a system which would use collaborative
caching techniques to improve the read accesses in OpenAFS. This project is
based on two observations.

Firstly, in a cluster environment, a large number of clients need same
datasets to work on i.e. the data on which client nodes need to execute is
same for many other nodes on the network. Currently, each client contacts
the server individually to fetch the data. This increase load on the server
unnecessarily. If the size of the file is very large then the problem would
be highly magnified.

Second observation is that the local bandwidth are mostly fast and runs into
Gbps. In a cluster, many clients would share the same geography and thus
have fast interconnects between them. The server might be connected through
a slow network link. In this situation, accessing data from another client
would be much faster than accessing data from server itself.

Instead of each client contacting the server individually, a collaborative
caching technique can be employed. When a client contacts a server for
fetching some data, the subsequent requests for the data can be forwarded to
this client. This reduces load on server and also improves bandwidth usage
at the server side. It also leads to faster data access if the link between
the requesting client is weaker than that with other clients.

Initially, we can start with a fixed list of peers at the client. The client
would access only these clients present on this list for collaboration.
Next, we would allow functionality to discover the peers. This can be done
using the fileserver. The fileserver can be modified to keep the access logs
of the clients and if a client request for any data then its corresponding
clients in these logs can be returned to the requesting client. The access
controls are also needed here as to how a fileserver could authorise a
client to fetch data from another client. Then in OpenAFS systems, server
responds with a callback to the client if the file it is using has been
modified. We have to consider the situation if some client is accessing data
from some other client and this client receives a callback in midst of the
transfer. In this situation we could make the call that the client uses to
get the hash from the fileserver also establish a callback guarantee. So
that all of the clients would be notified by the fileserver, regardless of
where they got their data from.

I have received a reply from Mr. Jeffrey Altman asking me to contact the
community for refining the proposal. He has suggested that It would be
useful to discuss the internal workings of the AFS cache manager and CM-FS
interactions so that I can refine my proposal. Also, please suggest a
project that I can perform over the next few days to demonstrate my
abilities and get selected for OpenAFS.

Link to project proposal on GSoC portal:
http://socghop.appspot.com/gsoc/student_proposal/private/google/gsoc2010/shrutijain/t127083915309

Thank You

Best Regards,

Shruti

--005045015a5904526204841edbed
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<p>Dear all,</p><p>I am a computer science student at IIIT Hyderabad, India=
. I am interested in contributing to OpenAFS and have applied in GSoC 2010 =
for OpenAFS. I think this would be good starting point for me to work with =
the community. I have also participated in GSoC 2009 with Globus Alliance a=
s my mentoring organization. Also I am working on a research project at my =
university to improve read access and execution performance for DFS.<br>
</p><p>I am interested in Collaborative Caching Project listed on the ideas=
 page. The project proposal I have submitted is as follows:<br></p><p>The p=
roject aims at developing a system which would use
collaborative caching techniques to improve the read accesses in
OpenAFS. This project is based on two observations.</p>
<p>Firstly, in a cluster environment, a large number of clients need
same datasets to work on i.e. the data on which client nodes need to
execute is same for many other nodes on the network. Currently, each
client contacts the server individually to fetch the data. This
increase load on the server unnecessarily. If the size of the file is
very large then the problem would be highly magnified.</p>
<p>Second observation is that the local bandwidth are mostly fast and
runs into Gbps. In a cluster, many clients would share the same
geography and thus have fast interconnects between them. The server
might be connected through a slow network link. In this situation,
accessing data from another client would be much faster than accessing
data from server itself.</p>
<p>Instead of each client contacting the server individually, a
collaborative caching technique can be employed. When a client contacts
a server for fetching some data, the subsequent requests for the data
can be forwarded to this client. This reduces load on server and also
improves bandwidth usage at the server side. It also leads to faster
data access if the link between the requesting client is weaker than
that with other clients.</p>
<p>Initially, we can start with a fixed list of peers at the client.
The client would access only these clients present on this list for
collaboration. Next, we would allow functionality to discover the
peers. This can be done using the fileserver. The fileserver can be
modified to keep the access logs of the clients and if a client request
for any data then its corresponding clients in these logs can be
returned to the requesting client. The access controls are also needed
here as to how a fileserver could authorise a client to fetch data from
another client. Then in OpenAFS systems, server responds with a
callback to the client if the file it is using has been modified. We
have to consider the situation if some client is accessing data from
some other client and this client receives a callback in midst of the
transfer. In this situation we could make the call that the client uses
to get the hash from the fileserver also establish a callback
guarantee. So that all of the clients would be notified by the
fileserver, regardless of where they got their data from.</p><p>I have rece=
ived a reply from Mr. Jeffrey Altman asking me to contact the community for=
 refining the proposal. He has suggested that It would be useful  to discus=
s the internal
workings of the AFS cache manager and CM-FS interactions so that I
can refine my proposal. Also, please suggest a project that I can perform o=
ver the next few days to demonstrate my abilities and get selected for Open=
AFS.</p><p>Link to project proposal on GSoC portal: <a href=3D"http://socgh=
op.appspot.com/gsoc/student_proposal/private/google/gsoc2010/shrutijain/t12=
7083915309">http://socghop.appspot.com/gsoc/student_proposal/private/google=
/gsoc2010/shrutijain/t127083915309</a></p>
<p>Thank You</p><p>Best Regards,</p><p>Shruti<br></p>

--005045015a5904526204841edbed--