[OpenAFS-devel] Regarding GSoC 2010 Collaborative Caching Project

Jeffrey Altman jaltman@secure-endpoints.com
Sat, 17 Apr 2010 10:01:16 -0400


This is a cryptographically signed message in MIME format.

--------------ms080304000104010206060607
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 4/17/2010 2:25 AM, shruti jain wrote:
> Here is what I know about the cache manager and its file server
> interactions.
> The Cache Manager// resides on the client side in openAFS environment
> and communicates with AFS file server on behalf of the application
> programs running on the client. When an AFS file is needed by any
> application program running on a client machine, the request is sent to=

> the Cache Manager which in turn issues RPC calls to the file server
> storing the requested file.

This is true for any object (file, directory, mount point, symlink, ...)

AFS supports readonly replicas.  The CM is permitted to request copies
of the data from any of the replicas although at present, the CM only
reads from a single replica at a time.

>// When the Cache Manager receives the
> requested data from the file Server, it stores it in the cache and also=

> delivers it to the application program which had initially requested fo=
r
> the data. In order to maintain cache consistency, server issues a
> callback along with the data. A callback is a promise by a File Server
> to a Cache Manager to inform any change in the data delivered by the
> File Server to the Cache Manager. If any other client on the network
> modifies the file then the file server breaks this callback and thus
> gives an indication to the Cache manager that its locally cached copy o=
f
> the file is obsolete and needs to be updated.The callback mechanism
> ensures that the Cache Manager always requests the most up-to-date
> version of a file. In this way, cache manager also performs the
> responsibility of maintaining the cache.

You have the general idea.  Let me provide a few additional details.  In
the original (and currently deployed) implementation of callbacks, a
callback is a promise that the FS will notify the CM of a change for up
to S seconds with values for read/write data typically measured in
minutes and for read-only data typically measured in hours.  The number
of callback promises (or registrations) that a FS can maintain is
finite.  Callback registrations can therefore be canceled prematurely
without there being a change.

The callback notification (or invalidation) is delivered via an
unauthenticated RPC channel.  As a result, the notification cannot be
trusted by the CM and must be treated as meaning "a change might have
occurred, please verify if it matters".

The existing callback notification does not provide any hint as to the
type of change that might have occurred.  Callback notifications are
issued for many reasons including:

 . the data changed
 . the access control list changed
 . other metadata changed
 . the locking state changed
 . the volume in which the data is located is being replicated
   (aka released)
 . the object has been deleted
 . the FS ran out of room in the registration table

Once a notification is issued, the registration is broken and the
CM will receive no further notifications until it requests updated
status for the object in question.

The CM determines what has changed by issuing a FetchStatus RPC to
the FS and comparing the prior and current status fields.

Matt Benjamin has developed and implemented (but its not shipping yet)
an extended version of callback notifications that provide the CM with
additional details regarding the change.  When combined with an
authenticated callback channel this becomes a very powerful combination.

It is also important to discuss how the FS and CM track object data.
Each time a change to the data (not the metadata) occurs, a data version
(DV) number for the object is incremented.  When the CM issues a
StoreData rpc, it is returned updated status info.  If the DV was
incremented by one, then the CM knows that there was no race with
another CM and all of the data in the cache for that file is still
current.  If the DV increment was greater than one, then the CM knows
that the data it just wrote is current, but all other data is suspect.

When using the Extended Callback mechanism, the FS can issue a
notification that a StoreData occurred affecting {FileID, offset,
length} and the current DV is N without canceling the callback
registration.  This permits the CM to maintain the cache coherency at a
lower cost of network traffic when an object is actively being used.

However, when a CM starts or when an object has been idle for more than
a few minutes, there will be no callback registration.  In that
situation, a change could have occurred to the file data and the CM will
be forced to discard all of the cached data if a change did occur.
Unfortunately, there is no mechanism at present for the CM to ask the FS
"I need the chunk of data represented by {FileID, offset, length} but I
currently have data in that range with the following hash value.  Could
you confirm that my data is current or send me the correct data?"

I have been considering a proposal to implement such an RPC,
RXAFS_FetchDataWithHash(FID, offset, length, hash).  With such an RPC in
place, the CM can verify the contents of the cache and avoid large
amounts of unnecessary traffic.

I am raising this idea here because I believe it is very applicable to
your project.  The trust model in AFS is between the CM and the FS.
There is no trust between CMs.  As a result, if a CM obtains data from
another CM, it needs a low cost mechanism to validate it against the FS.

> So in this project, we need to modify the cache manager to enable
> interactions with other clients as well.
> In the first part of the project, where the cache manager contacts a
> fixed set of remote clients, it retrieves the file from any of these
> clients if their callback of the file is not broken. Since the callback=

> is not broken, it is an indication that the file present on this remote=

> client is most recent. In case no client has most recent copy of the
> file, we can contact the file server to retrieve the data.

That is one approach but not the one I would take.  If the cost of
reading the data from a local CM is so much cheaper than reading it from
the FS, the CM can read the data from the other CM (or at least get its
hash) and then verify it with the file server.

In most file operations, the entire file is not re-written.  Just
portions of it are and in the case of "append only files" such as log
files, the data never changes after it is written.  Re-fetching this
data from the FS every time the DV changes is extremely wasteful.  It is
much better to obtain it in the cheapest mechanism possible and then
verify it via a trusted means.

> In the second part of the project, we can allow discovery of peer
> clients for collaboration. This can be done by modifying the file serve=
r
> to keep access logs of the clients and if a client requests for any dat=
a
> then its corresponding clients in the logs would be returned to the
> requesting clients. In order to maintain cache consistency, the
> requesting client also establishes a callback guarantee from the file
> server so that it knows of the modifications in the file irrespective o=
f
> where it has got the file from.

I would leave the FS out of the peer collaboration and instead permit
CMs that wish to offer data to do so via Bonjour.

>=20
> I have seen the files afs_callback.c, cbqueue.c, dcache.c and server.c
> and think that these are some of the programs used in cache manager and=

> server-cache manager interactions. Please correct me if I am wrong.

In terms of how I would like to see this project structured.  Before any
collaboration is implemented I would like to see a generic mechanism
added to the CM to permit use of a second level cache.  Then once than
mechanism is in place, a plug-in to that framework can be implemented
that supports obtaining data from the second level cache which happens
to be peer CMs.

The benefit of this approach is that the framework for the second level
cache can be implemented and incorporated into a future openafs release
without committing us to a particular implementation of the peer to peer
protocols.  Future research in peer to peer cache sharing can then take
place at a much lower cost.

Jeffrey Altman




--------------ms080304000104010206060607
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJeTCC
AxcwggKAoAMCAQICEAMF9RTCGOz151fTpHLih+cwDQYJKoZIhvcNAQEFBQAwYjELMAkGA1UE
BhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMT
I1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA5MDgyODA0MDExOVoX
DTEwMDgyODA0MDExOVowczEPMA0GA1UEBBMGQWx0bWFuMRUwEwYDVQQqEwxKZWZmcmV5IEVy
aWMxHDAaBgNVBAMTE0plZmZyZXkgRXJpYyBBbHRtYW4xKzApBgkqhkiG9w0BCQEWHGphbHRt
YW5Ac2VjdXJlLWVuZHBvaW50cy5jb20wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIB
AQDZNscYIvF6xzGSAfa/QUIqiElyn0EUxL2b86eKiYqe91bj0gLr/MJoErLnb+OmokxqSAH6
y0zlFqSbiFwgNM8m69K6m/6YO+x3+5zBc+u6snwTWMEWygnhx3rQ/lMhoQOgArraL+/k9aWL
kNdaXQKk6EZVW9pfV2A4Lk4DoZGFjY8tJRWWDLlFkYnxDuIEpLYwJpwakv3QHOaq/G8KW0iE
jVhVzPobuZzwD2tuepY/bsClwqxz/gfAEpUvAn/lYTqnoT7RYljZlCIdbrgcG/HSYMxAy1Zp
Yh8Fx+9cqsG8O4nqo26SVfYZvrYhh8m6OqW8Vakdt7vBLCTa/QhIdJ4hAgMBAAGjOTA3MCcG
A1UdEQQgMB6BHGphbHRtYW5Ac2VjdXJlLWVuZHBvaW50cy5jb20wDAYDVR0TAQH/BAIwADAN
BgkqhkiG9w0BAQUFAAOBgQBvbvJNXUJ4atv1CExIe0J38jZqoEUTttkXOfCDT9e3mSmVboOK
ifHDyLZQC4qSsCUfP7vdwAXjKtjak22HbfX2sEKCUgtnOkxRqXMM2V/NW/ESNVQZF0TO7L/Z
cW3icObO9FIZCSmgFMt2Al7VPfMQmaJNlqu9SLmXSwbRFJ5b4zCCAxcwggKAoAMCAQICEAMF
9RTCGOz151fTpHLih+cwDQYJKoZIhvcNAQEFBQAwYjELMAkGA1UEBhMCWkExJTAjBgNVBAoT
HFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25h
bCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA5MDgyODA0MDExOVoXDTEwMDgyODA0MDExOVow
czEPMA0GA1UEBBMGQWx0bWFuMRUwEwYDVQQqEwxKZWZmcmV5IEVyaWMxHDAaBgNVBAMTE0pl
ZmZyZXkgRXJpYyBBbHRtYW4xKzApBgkqhkiG9w0BCQEWHGphbHRtYW5Ac2VjdXJlLWVuZHBv
aW50cy5jb20wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDZNscYIvF6xzGSAfa/
QUIqiElyn0EUxL2b86eKiYqe91bj0gLr/MJoErLnb+OmokxqSAH6y0zlFqSbiFwgNM8m69K6
m/6YO+x3+5zBc+u6snwTWMEWygnhx3rQ/lMhoQOgArraL+/k9aWLkNdaXQKk6EZVW9pfV2A4
Lk4DoZGFjY8tJRWWDLlFkYnxDuIEpLYwJpwakv3QHOaq/G8KW0iEjVhVzPobuZzwD2tuepY/
bsClwqxz/gfAEpUvAn/lYTqnoT7RYljZlCIdbrgcG/HSYMxAy1ZpYh8Fx+9cqsG8O4nqo26S
VfYZvrYhh8m6OqW8Vakdt7vBLCTa/QhIdJ4hAgMBAAGjOTA3MCcGA1UdEQQgMB6BHGphbHRt
YW5Ac2VjdXJlLWVuZHBvaW50cy5jb20wDAYDVR0TAQH/BAIwADANBgkqhkiG9w0BAQUFAAOB
gQBvbvJNXUJ4atv1CExIe0J38jZqoEUTttkXOfCDT9e3mSmVboOKifHDyLZQC4qSsCUfP7vd
wAXjKtjak22HbfX2sEKCUgtnOkxRqXMM2V/NW/ESNVQZF0TO7L/ZcW3icObO9FIZCSmgFMt2
Al7VPfMQmaJNlqu9SLmXSwbRFJ5b4zCCAz8wggKooAMCAQICAQ0wDQYJKoZIhvcNAQEFBQAw
gdExCzAJBgNVBAYTAlpBMRUwEwYDVQQIEwxXZXN0ZXJuIENhcGUxEjAQBgNVBAcTCUNhcGUg
VG93bjEaMBgGA1UEChMRVGhhd3RlIENvbnN1bHRpbmcxKDAmBgNVBAsTH0NlcnRpZmljYXRp
b24gU2VydmljZXMgRGl2aXNpb24xJDAiBgNVBAMTG1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFp
bCBDQTErMCkGCSqGSIb3DQEJARYccGVyc29uYWwtZnJlZW1haWxAdGhhd3RlLmNvbTAeFw0w
MzA3MTcwMDAwMDBaFw0xMzA3MTYyMzU5NTlaMGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxU
aGF3dGUgQ29uc3VsdGluZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwg
RnJlZW1haWwgSXNzdWluZyBDQTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAxKY8VXNV
+065yplaHmjAdQRwnd/p/6Me7L3N9VvyGna9fww6YfK/Uc4B1OVQCjDXAmNaLIkVcI7dyfAr
hVqqP3FWy688Cwfn8R+RNiQqE88r1fOCdz0Dviv+uxg+B79AgAJk16emu59l0cUqVIUPSAR/
p7bRPGEEQB5kGXJgt/sCAwEAAaOBlDCBkTASBgNVHRMBAf8ECDAGAQH/AgEAMEMGA1UdHwQ8
MDowOKA2oDSGMmh0dHA6Ly9jcmwudGhhd3RlLmNvbS9UaGF3dGVQZXJzb25hbEZyZWVtYWls
Q0EuY3JsMAsGA1UdDwQEAwIBBjApBgNVHREEIjAgpB4wHDEaMBgGA1UEAxMRUHJpdmF0ZUxh
YmVsMi0xMzgwDQYJKoZIhvcNAQEFBQADgYEASIzRUIPqCy7MDaNmrGcPf6+svsIXoUOWlJ1/
TCG4+DYfqi2fNi/A9BxQIJNwPP2t4WFiw9k6GX6EsZkbAMUaC4J0niVQlGLH2ydxVyWN3amc
OY6MIE9lX5Xa9/eH1sYITq726jTlEBpbNU1341YheILcIRk13iSx0x1G/11fZU8xggNxMIID
bQIBATB2MGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUgQ29uc3VsdGluZyAoUHR5
KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgSXNzdWluZyBDQQIQ
AwX1FMIY7PXnV9OkcuKH5zAJBgUrDgMCGgUAoIIB0DAYBgkqhkiG9w0BCQMxCwYJKoZIhvcN
AQcBMBwGCSqGSIb3DQEJBTEPFw0xMDA0MTcxNDAxMTZaMCMGCSqGSIb3DQEJBDEWBBRutoWY
aGpfG9mIOwbMctg93yJ3wzBfBgkqhkiG9w0BCQ8xUjBQMAsGCWCGSAFlAwQBAjAKBggqhkiG
9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYIKoZIhvcN
AwICASgwgYUGCSsGAQQBgjcQBDF4MHYwYjELMAkGA1UEBhMCWkExJTAjBgNVBAoTHFRoYXd0
ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25hbCBGcmVl
bWFpbCBJc3N1aW5nIENBAhADBfUUwhjs9edX06Ry4ofnMIGHBgsqhkiG9w0BCRACCzF4oHYw
YjELMAkGA1UEBhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4x
LDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBAhADBfUUwhjs
9edX06Ry4ofnMA0GCSqGSIb3DQEBAQUABIIBALDGZ6qEMTEVR/dZmnFMOgVegKoKIRXJbI9u
eiLJWa2jQUBAJDjsiwrAf4agNaZsAGmikcLPs30Vw7ckLlDm3hHRqhXxpFe09zp/brOLs1e5
smuPcbTVefJD8OVFBZuDV3xvtyNBx3KvDbs52AiwPpJyhz0m1g/r7/B9DIBEcP6qCFCLh3eA
ympiE+Q33qdI0ErLLR/IxovGCCC0CSnzTHB6wEgB1uYdmdgZ3x18TmYsj8U9M433Y+6DNpZN
1S5SLjovzeP1olwv/fNNXf0lzIFCqISwToLIWc9siJPkzALOmk4Uu/3ON8JkcFk/r7Am3h7K
c2ZPVkZpVb/e2+DqL00AAAAAAAA=
--------------ms080304000104010206060607--