[OpenAFS-devel] Re : Help needed in developing disconnected operation for OpenAFS on windows

Jeffrey Altman jaltman@secure-endpoints.com
Tue, 04 Mar 2008 07:59:31 -0500


This is a cryptographically signed message in MIME format.

--------------ms070103010705020501040004
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Simon Wilkinson wrote:
> I'm in the process of updating the Michigan disconnected operation code 
> for the Unix tree, so here are my thoughts on what I'm doing there. Bear 
> in mind that none of this has been accepted into the tree yet! Sorry for 
> polluting the Windows list with Unix comments
> 
> (I've set followups to openafs-devel)
> 
> Jeffrey Altman wrote:
> 
>> Disconnected operations should not be a globally setting.  That is
>> acceptable for a research project that demonstrates the capability but
>> it is not acceptable for real world environments in which some servers
>> or cells may not be accessible while others remain accessible.
> 
> I guess this depends on what you're trying to achieve through providing 
> disconnected operation, and the quality of the user experience you can 
> provide when performing re-integration. Looking at other disconnected 
> systems, one of the usability challenges of Coda is that clients can go 
> disconnected without the user's knowledge, and so the user can end up 
> having to resolve integration conflicts which aren't of their making, 
> and which they were completely unaware of. This tends to score badly for 
> usability, as it violates some of the user's fundamental assumptions. 
> Providing a system which requires an explicit 'go disconnected' step has 
> the advantage that the user is aware both of when they disconnected from 
> the network, and when they reconnected. This allows them to rationalise 
> any conflict resolution steps that they have to perform.
> 
> That's not to say that 'opportunistic' disconnection (as I'm christening 
> the solution you outline - where the cache manager continues to serve 
> files for which it had a valid callback when the file server 
> disappeared, without any user interaction) doesn't have real uses - I 
> just think that the usability challenges are far higher.

But you are forgetting the "must intentionally configure what you want
to be able to use offline" step.  What the Windows cache manager does 
today is optimistic disconnection.  If the data was actively in use when 
the connectivity to the server was lost, do not fail a request 
immediately if it can be served by the cache manager without the help of 
the file server.  However, the minute that an operation that does 
require the file server takes place (close file, ACL check, ...) the 
error is immediate.

Usable offline systems are those in which the user specifies up front 
what portions of the network file space are required for offline use and 
what the policies for those objects are.  There is a huge difference in 
the offline behavior for read-only objects vs those for read-write.

For data which is pre-configured for offline use there is no harm in 
switching to disconnected operations.  Its a lot better than having the 
application crash because one of its DLLs can no longer be read or 
having its data be lost because its file handle is no longer valid.

> 
>> (1) how do you ensure that you have all of the data for all of the files
>> and directories that the user wishes to access in the cache?   AFS
>> caches arbitrary blocks not whole files or directories.
> 
> I'll add to this:
> 
> 1a) How do you ensure that the data you have in the cache is 
> sufficiently recent to be of use to the client
> 
> The naive mechanism, as implemented by the Michigan code, just serves 
> whatever happens to be in the cache back to the user. The problem is 
> that, depending on the size of your cache against your normal working 
> set, it's possible that you might get files that are months, out of 
> date. The normal AFS way of resolving this is to hold callbacks for 
> these files - you could extend this to disconnected operation by adding 
> a 'pinning' functionality, where a user indicates to the cache manager 
> that they want a particular file to be available offline, and the cache 
> manager should ensure that its always up to date on the client. However, 
> if you attempt to hold callbacks for every file in a users offline set, 
> then you're likely to cause severe callback storms with the fileserver 
> (multiple clients hold more than the fileserver's maximum number of 
> callbacks - fileserver starts breaking older callbacks, clients see 
> callback breaks and attempt to update pinned files, fileserver creates 
> new callbacks for these, and round and round we go)
> 
> The question of how we ensure acceptable recency, without making 
> fileserver changes, is a tricky one.

This is less of an issue if you are using an offline model that is 
stored outside of AFS and uses redirection.   It is really important to 
understand the usage model for which offline access is required.

There are just a handful of use cases that seem to be of critical 
importance:

(1) Distribution of read-only data.  Applications and documents.  These 
objects do not change often and only need to be synchronized on a 
periodic basis.  What is important is that a consistent set of the data 
be available when required.

(2) User Profiles.  Read-write.  The synchronization rules used by 
Windows is "last writer wins."  So the synchronization rules for a 
profile are "if conflict is detected, use local copy."   User profiles 
are synchronized only at login and logout.  Intermediate changes do not 
matter or at least Windows doesn't check.

(3) Home directories and Shared Project directories.  Read-write.  These 
are frequently used files which change often.  Periodic checks must be 
made to ensure that local copies are up to date.  File sets must be 
maintained consistently.  Policy will determine how often the 
synchronization checks will occur when a file set is not in use.  As 
soon as any object in a file set is touched, the entire file set must be 
synchronized and kept current until the file set is no longer in use 
again.  Write-backs to the file server can result in collisions. 
Policies can be assigned to file sets.  "Server copy wins, Local copy 
wins, Prompt user, etc."

File server callback storms will not take place if the synchronization 
logic is not primarily dependent on the existing callback mechanism.

> 
>> (2) how do you synchronize read and write locks when the file server is
>> not accessible?
> 
> It's relatively easy to maintain a list of the locks granted by the 
> cache manager whilst in disconnected mode, and you can ensure that the 
> locking protects processes running on the same machine from each other. 
> The issue is what you do when reconnecting. The cache manager plays the 
> list of locally granted locks to the fileserver, and all is well if it 
> grants them. But, what happens if the fileserver refuses a lock. You 
> can't recall locks which have already been issued, so you can have a 
> situation where there's a process happily writing to a file, under what 
> it believes is a write lock, whilst it actually has no lock at all on 
> the server. As I see it, there are three options 1) Ignore the problem; 
> 2) Fail reads and writes to that file descriptor as soon as the lock 
> fails; 3) 'Defer' reintegration of that file until it is closed, and 
> deal with the problem then.
> 
> This is a much bigger issue on Windows than Unix, though.

There are two components here.  The lock and the data version.  If a 
file is currently open and the application is accessing the file in 
disconnected mode, then continue to treat that file set as disconnected 
until all the file handles are closed.  Then perform whatever 
synchronization policy applies for the file set.

I should point out that this is becoming a bigger issue for UNIX as 
applications get used to CIFS semantics.  Notice the behavior of Open 
Office for example.

> 
>> (3) how do you interact with the end user to notify them of collisions
>> and what do you do when there are collisions?
> 
> I'm currently implementing a collision resolution policy of "last closer 
> wins". Whilst this does have the potential to cause significant data 
> loss, it has the big advantage over more complex resolution policies 
> that it's explainable to, and understandable by, the user. At the moment 
> collisions get logged in the system log. It would be possible to take 
> advantage of some of the new desktop technologies appearing for Unix to 
> get those messages closer to the user (although, on multi-user machines, 
> desktop based notifications break down)

I do not believe this model is deployable in the Windows world.

>> (5) how do you address access control issues for files that are offline?
> 
> The Michigan code simply disables access control when a machine goes 
> offline. With the Unix model, this is more acceptable - machines only go 
> offline with an explicit command, which can only be issued by the super 
> user. The super user has access to the cache contents, anyway. However, 
> this doesn't help with people who have implemented access controls to 
> protect themselves from silly mistakes.
> 
> I've got a provisional implementation of 'local' tokens which can be 
> used to convey CPS information from the userland to the cache manager, 
> but won't be usable in a connected environment. My eventual plan is that 
> it's possible to 'stash' access data for a particular userid to a file, 
> from where it can be reloaded while the cache manager is offline. 
> However, as soon as you start using these you run in to ...

By pulling offline operations out of the cache manager and implementing 
it with a redirector model I believe that all of these issues can be 
avoided.  Synchronization requires AFS credentials.  Offline access 
requires local machine credentials and are enforced by the local file 
system based upon the user rights granted at the time the 
synchronization was configured.

>> (6) how do you ensure that the file are synchronized back to file server
>> with the same user credentials that were intended to be used when the
>> files were modified?
> 
> This is tricky. I don't (yet) have a good answer to this one. At the 
> moment, all replays have to come from a single identity (and their token 
> had better be valid when reintegration starts)

Yep.


--------------ms070103010705020501040004
Content-Type: application/x-pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJeTCC
AxcwggKAoAMCAQICEALr5BE3U6n+HWCoLbyhohMwDQYJKoZIhvcNAQEFBQAwYjELMAkGA1UE
BhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMT
I1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA3MDUzMTA2MTM1N1oX
DTA4MDUzMDA2MTM1N1owczEPMA0GA1UEBBMGQWx0bWFuMRUwEwYDVQQqEwxKZWZmcmV5IEVy
aWMxHDAaBgNVBAMTE0plZmZyZXkgRXJpYyBBbHRtYW4xKzApBgkqhkiG9w0BCQEWHGphbHRt
YW5Ac2VjdXJlLWVuZHBvaW50cy5jb20wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIB
AQCsoz/0+s4Cn65n/3bU3shXw4y5u1uEMEsBOiqNU0PfIKGYQe95b1FKNbNAkctSdQT6GF5c
bhSnJPmb2OOb1frx64dlDgskaG561xa8XPA1aP8Cc+33dgsSLIxGEh97lyUYHEfWBC03KMCF
PKhZfcrGAXoVCrFBadnLAokQbUTFahVg/qQx2IT3wSj1sCIfV5UDuXcEKHCvRtEZIsSzu184
9Cj6I4nY5bt+r94kyDHM94MHYBJi+6tWLFRy2gkIB3HEPmxAiQrKljNpH9bOffiBLIAgmJ6d
1ZXepBXyexQbwOYvftpVlMEFHHQmdiwH3tj69hE78XvM5X9J+SbjbuNpAgMBAAGjOTA3MCcG
A1UdEQQgMB6BHGphbHRtYW5Ac2VjdXJlLWVuZHBvaW50cy5jb20wDAYDVR0TAQH/BAIwADAN
BgkqhkiG9w0BAQUFAAOBgQB8FShDN2Ig034Y5eyadiFDEtOvsIJ3Z2xV9aTL4u8xMlz1gZR1
AZAvCv+ZMMRRKWCsrG5tItV8DFPSfWAGMpInmMarA4f76JRLQEUhkRUg8GpkJM5ryk5EDakk
0oiBQcQD8A+UHwrcmaj3UWxQ9zCjDgU+1mY9nEQxZZyp4eeUfzCCAxcwggKAoAMCAQICEALr
5BE3U6n+HWCoLbyhohMwDQYJKoZIhvcNAQEFBQAwYjELMAkGA1UEBhMCWkExJTAjBgNVBAoT
HFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25h
bCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA3MDUzMTA2MTM1N1oXDTA4MDUzMDA2MTM1N1ow
czEPMA0GA1UEBBMGQWx0bWFuMRUwEwYDVQQqEwxKZWZmcmV5IEVyaWMxHDAaBgNVBAMTE0pl
ZmZyZXkgRXJpYyBBbHRtYW4xKzApBgkqhkiG9w0BCQEWHGphbHRtYW5Ac2VjdXJlLWVuZHBv
aW50cy5jb20wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCsoz/0+s4Cn65n/3bU
3shXw4y5u1uEMEsBOiqNU0PfIKGYQe95b1FKNbNAkctSdQT6GF5cbhSnJPmb2OOb1frx64dl
DgskaG561xa8XPA1aP8Cc+33dgsSLIxGEh97lyUYHEfWBC03KMCFPKhZfcrGAXoVCrFBadnL
AokQbUTFahVg/qQx2IT3wSj1sCIfV5UDuXcEKHCvRtEZIsSzu1849Cj6I4nY5bt+r94kyDHM
94MHYBJi+6tWLFRy2gkIB3HEPmxAiQrKljNpH9bOffiBLIAgmJ6d1ZXepBXyexQbwOYvftpV
lMEFHHQmdiwH3tj69hE78XvM5X9J+SbjbuNpAgMBAAGjOTA3MCcGA1UdEQQgMB6BHGphbHRt
YW5Ac2VjdXJlLWVuZHBvaW50cy5jb20wDAYDVR0TAQH/BAIwADANBgkqhkiG9w0BAQUFAAOB
gQB8FShDN2Ig034Y5eyadiFDEtOvsIJ3Z2xV9aTL4u8xMlz1gZR1AZAvCv+ZMMRRKWCsrG5t
ItV8DFPSfWAGMpInmMarA4f76JRLQEUhkRUg8GpkJM5ryk5EDakk0oiBQcQD8A+UHwrcmaj3
UWxQ9zCjDgU+1mY9nEQxZZyp4eeUfzCCAz8wggKooAMCAQICAQ0wDQYJKoZIhvcNAQEFBQAw
gdExCzAJBgNVBAYTAlpBMRUwEwYDVQQIEwxXZXN0ZXJuIENhcGUxEjAQBgNVBAcTCUNhcGUg
VG93bjEaMBgGA1UEChMRVGhhd3RlIENvbnN1bHRpbmcxKDAmBgNVBAsTH0NlcnRpZmljYXRp
b24gU2VydmljZXMgRGl2aXNpb24xJDAiBgNVBAMTG1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFp
bCBDQTErMCkGCSqGSIb3DQEJARYccGVyc29uYWwtZnJlZW1haWxAdGhhd3RlLmNvbTAeFw0w
MzA3MTcwMDAwMDBaFw0xMzA3MTYyMzU5NTlaMGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxU
aGF3dGUgQ29uc3VsdGluZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwg
RnJlZW1haWwgSXNzdWluZyBDQTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAxKY8VXNV
+065yplaHmjAdQRwnd/p/6Me7L3N9VvyGna9fww6YfK/Uc4B1OVQCjDXAmNaLIkVcI7dyfAr
hVqqP3FWy688Cwfn8R+RNiQqE88r1fOCdz0Dviv+uxg+B79AgAJk16emu59l0cUqVIUPSAR/
p7bRPGEEQB5kGXJgt/sCAwEAAaOBlDCBkTASBgNVHRMBAf8ECDAGAQH/AgEAMEMGA1UdHwQ8
MDowOKA2oDSGMmh0dHA6Ly9jcmwudGhhd3RlLmNvbS9UaGF3dGVQZXJzb25hbEZyZWVtYWls
Q0EuY3JsMAsGA1UdDwQEAwIBBjApBgNVHREEIjAgpB4wHDEaMBgGA1UEAxMRUHJpdmF0ZUxh
YmVsMi0xMzgwDQYJKoZIhvcNAQEFBQADgYEASIzRUIPqCy7MDaNmrGcPf6+svsIXoUOWlJ1/
TCG4+DYfqi2fNi/A9BxQIJNwPP2t4WFiw9k6GX6EsZkbAMUaC4J0niVQlGLH2ydxVyWN3amc
OY6MIE9lX5Xa9/eH1sYITq726jTlEBpbNU1341YheILcIRk13iSx0x1G/11fZU8xggNkMIID
YAIBATB2MGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUgQ29uc3VsdGluZyAoUHR5
KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgSXNzdWluZyBDQQIQ
AuvkETdTqf4dYKgtvKGiEzAJBgUrDgMCGgUAoIIBwzAYBgkqhkiG9w0BCQMxCwYJKoZIhvcN
AQcBMBwGCSqGSIb3DQEJBTEPFw0wODAzMDQxMjU5MzFaMCMGCSqGSIb3DQEJBDEWBBRaVWi/
z9lNRquXEkaco/+sBQ81qzBSBgkqhkiG9w0BCQ8xRTBDMAoGCCqGSIb3DQMHMA4GCCqGSIb3
DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG9w0DAgIBKDCBhQYJKwYB
BAGCNxAEMXgwdjBiMQswCQYDVQQGEwJaQTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcg
KFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3Vpbmcg
Q0ECEALr5BE3U6n+HWCoLbyhohMwgYcGCyqGSIb3DQEJEAILMXigdjBiMQswCQYDVQQGEwJa
QTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhh
d3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0ECEALr5BE3U6n+HWCoLbyhohMwDQYJ
KoZIhvcNAQEBBQAEggEAJzc6KAsqZTNLBdGKqXtVDWjxnEVVb4Nt7BLAiZPv4vTLWGVN0Ule
Tla45B7dec2Q6JY3WP4SpAed/JDe9J4bTCec207BJe1K2Gf4etpHcJss+m1DBK3tnI6jrJei
n7tn/PofxF/uXYDxe/yMQCLYd3UKOY7/ZkwyL9UP/41lkB9IyvkpJGHxxtmN+I+HO/jwNzvv
u3Ar/54ZcqmMlOWTr2SCoMpnrLa5itWQUE2KrcxuBSPScboaafRhODmpL0ZK5IBDJ45vGFqq
D9YMhxePAS/FXO8utz80nLIOl4AlRkfX+bHFXDjXiCy5oQDNsbQ3AIzeZeqOoM7ZrLL3CE5y
FwAAAAAAAA==
--------------ms070103010705020501040004--