[AFS3-std] AFS3 support for sparse files (SEEK_HOLE&co)?

Jeffrey Altman jaltman@your-file-system.com
Thu, 04 Sep 2014 08:07:13 -0400


This is a cryptographically signed message in MIME format.

--------------ms050508090200020105070401
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

On 9/4/2014 4:33 AM, Lionel Cons wrote:
> On 4 September 2014 03:37, Jeffrey Altman <jaltman@your-file-system.com=
> wrote:
>> On 9/3/2014 1:29 PM, Lionel Cons wrote:
>>> Will AFS3 include support for sparse files, e.g. files which have one=

>>> or multiple holes (see POSIX lseek() documentation about SEEK_HOLE an=
d
>>> SEEK_DATA) where no data reside?
>>>
>>> CERN has a lot of projects which rely on sparse files and it would be=

>>> a mandatory feature for a migration to AFS3.
>>>
>>> Lionel
>>
>> Hi Lionel,
>>
>> You are writing to the AFS3 Standardization list which is the
>> appropriate location for discussing AFS3 RPC protocol extensions. In
>> case you aren't aware, in the AFS model file pointer operations are
>> local to the client and are satisfied by a cache manager that obtains
>> chunks of files at a time from the file servers.   Cache managers do n=
ot
>> in general cache whole files.  Instead the cache managers fetch only
>> those chunks of the file that are requested by the application.  While=

>> the files themselves might not be sparse the caching of file data by t=
he
>> clients will be if that is the access pattern issued by the applicatio=
n.
>>
>> In the existing AFS3 protocol the FetchData and StoreData operations d=
o
>> not support compression and do not support any notion of sparseness (a=

>> range of zero bytes.)  Nor do the file servers recognize that a chunk
>> being written is entirely zero bytes and in turn store the file as a
>> sparse file in its backend.  This is one of the reasons that using ZFS=

>> with disk compression as the backing store can be a big win.
>=20
> There is one very large, and very bad misconception:
> Holes in a sparse file return '\0' (zero) bytes on read(), but it does
> not mean that all data which contain '\0' bytes are holes. This
> misconception can lead to serious data loss situations because
> applications need to be able to differ between holes in a file -
> representing 'no data here' and large ranges containing '\0' bytes -
> which represent valid data containing '\0' bytes.

If you are building your application around such properties you need
to be extremely careful to understand the sparse file implementation
of the file system that is storing the data.   File systems differ
widely in their implementations.  While I have never seen a file system
convert a stream of zero bytes into a hole, doing so is permitted.  What
is more common is that due to resource allocation strategies holes are
only permitted at certain alignments and and non-hole series of zero
bytes can be introduced silently.

>> In summary, at the present time there is no protocol support for a cac=
he
>> manager to communicate the Fetching or Storing of a sparse range nor i=
s
>> there any support for a cache manager to query a file server to reques=
t
>> allocated ranges.  As a result is not currently possible for an AFS
>> cache manager to implement VFS layer support for lseek(2) SEEK_HOLE an=
d
>> SEEK_DATA when it is present.  (Nor is there the ability to implement
>> Microsoft Windows' FSCTL_SET_SPARSE, FSCTL_SET_ZERO_DATA, and
>> FSCTL_QUERY_ALLOCATED_RANGES sparse file operations.)
>>
>> From an AFS client implementation perspective, for example OpenAFS on
>> Linux (3.1 or later) or Solaris it would be possible for the cache
>> manager to support SEEK_DATA and SEEK_HOLE as recommended by the Linux=

>> man page.  SEEK_HOLE always returns the offset of the end of file and
>> SEEK_DATA always return the offset specified in the lseek().  Discussi=
on
>> of adding that level of support should take place on the
>> openafs-devel@openafs.org mailing list.

The implementation I described above is compliant with the
specification.  Yet it will not meet your requirements.

>>
>> What are the requirements of the applications in question?
>=20
> 1. Databases

For databases a bigger problem is the lack of byte range locking in the
AFS3 protocol.

> 2. Simulation software which, basically described, creates one large
> sparse file (typically something like 2PB or larger) and then calls
> mmap() on it. Filesystem will allocate space on an as needed basis.
> Holes in these files represent areas of 'no data' or 'no data
> calculated here yet' while real data - even huge ranges of '\0' bytes
> - represent calculated areas.

This would be a poor choice for AFS at the moment and most other network
file systems.

Jeffrey Altman




--------------ms050508090200020105070401
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIINITCC
BkIwggUqoAMCAQICEDirAC//rpa3Vv85Wvtd5xswDQYJKoZIhvcNAQEFBQAwgcoxCzAJBgNV
BAYTAlVTMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1
c3QgTmV0d29yazE6MDgGA1UECxMxKGMpIDE5OTkgVmVyaVNpZ24sIEluYy4gLSBGb3IgYXV0
aG9yaXplZCB1c2Ugb25seTFFMEMGA1UEAxM8VmVyaVNpZ24gQ2xhc3MgMSBQdWJsaWMgUHJp
bWFyeSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eSAtIEczMB4XDTExMDkwMTAwMDAwMFoXDTIx
MDgzMTIzNTk1OVowgaYxCzAJBgNVBAYTAlVTMR0wGwYDVQQKExRTeW1hbnRlYyBDb3Jwb3Jh
dGlvbjEfMB0GA1UECxMWU3ltYW50ZWMgVHJ1c3QgTmV0d29yazEeMBwGA1UECxMVUGVyc29u
YSBOb3QgVmFsaWRhdGVkMTcwNQYDVQQDEy5TeW1hbnRlYyBDbGFzcyAxIEluZGl2aWR1YWwg
U3Vic2NyaWJlciBDQSAtIEc0MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxuwn
/R1j9DsdisHTHMjIgoa2uEqGkqqBXHLKMA0vnkEiVzAhJZCao/SsKsaIF4ZhchN2LuwDyyeb
jyCAN+DkitpVplAP/LlcI2mJQqG6H6/vDvmkyQrx+DeyxtmSSq5937hEH5u6P4wG/tgjT0hR
I2pghKjuJy9g35byGiqMPI8AzE/L+iCOvDX24fCatgXz/B0/xhR7DtryBeTTgwKmxWlwtKnk
VunbHVz0pjbia7UeKi3cvrvuOgSwMAitX2hsxr0GloiE5+apZC28ODC7iCbDZ2ZmtLR3+cCh
xw5y72bi5bnK4POFdzWY3tQcsP5mceI4y258T0BV65fZqBge7QIDAQABo4ICRDCCAkAwOAYI
KwYBBQUHAQEELDAqMCgGCCsGAQUFBzABhhxodHRwOi8vcGtpLW9jc3AudmVyaXNpZ24uY29t
MBIGA1UdEwEB/wQIMAYBAf8CAQAwbAYDVR0gBGUwYzBhBgtghkgBhvhFAQcXATBSMCYGCCsG
AQUFBwIBFhpodHRwOi8vd3d3LnN5bWF1dGguY29tL2NwczAoBggrBgEFBQcCAjAcGhpodHRw
Oi8vd3d3LnN5bWF1dGguY29tL3JwYTA0BgNVHR8ELTArMCmgJ6AlhiNodHRwOi8vY3JsLnZl
cmlzaWduLmNvbS9wY2ExLWczLmNybDAOBgNVHQ8BAf8EBAMCAQYwKQYDVR0RBCIwIKQeMBwx
GjAYBgNVBAMTEVZlcmlTaWduTVBLSS0yLTk3MB0GA1UdDgQWBBSt+cOTci21uShh5KTXYNXE
Cl4aATCB8QYDVR0jBIHpMIHmoYHQpIHNMIHKMQswCQYDVQQGEwJVUzEXMBUGA1UEChMOVmVy
aVNpZ24sIEluYy4xHzAdBgNVBAsTFlZlcmlTaWduIFRydXN0IE5ldHdvcmsxOjA4BgNVBAsT
MShjKSAxOTk5IFZlcmlTaWduLCBJbmMuIC0gRm9yIGF1dGhvcml6ZWQgdXNlIG9ubHkxRTBD
BgNVBAMTPFZlcmlTaWduIENsYXNzIDEgUHVibGljIFByaW1hcnkgQ2VydGlmaWNhdGlvbiBB
dXRob3JpdHkgLSBHM4IRAItbdVaEVIULAM+vOEjOsaQwDQYJKoZIhvcNAQEFBQADggEBANaP
wdqbiPKzbE0fWC+6AVFddMFG6MO4e5/WQPHv/zK6iWvADjRDn6SZ5qTwXUgzYoWFYf4jiCKM
YJsrnGVJlMSiOCRIpVylUEto6WIip5PomSJuPVu7EEIOH0x1RzRWCY/4vYw881y70pZwVHBi
Te/REL6dSCxe7IZrB4LwPeElJygs4BZ2HrP95WKW0oo9Xyuu+1zCE7dlY8s0dkOf1oeZq26t
lcEAP0Yngf813iMOQ9wUXzL5yinvwlIw9ZnduYH4OiUgjYJo8rkhhXRmBOGGORYy8i3WKqjJ
3tkAAk/jGCDFpYFWtpXe04Kt+HslvmR8LqC6cCz4+XXidE0HbYQwggbXMIIFv6ADAgECAhB4
sMGg25SPPErGZAEUkQeTMA0GCSqGSIb3DQEBBQUAMIGmMQswCQYDVQQGEwJVUzEdMBsGA1UE
ChMUU3ltYW50ZWMgQ29ycG9yYXRpb24xHzAdBgNVBAsTFlN5bWFudGVjIFRydXN0IE5ldHdv
cmsxHjAcBgNVBAsTFVBlcnNvbmEgTm90IFZhbGlkYXRlZDE3MDUGA1UEAxMuU3ltYW50ZWMg
Q2xhc3MgMSBJbmRpdmlkdWFsIFN1YnNjcmliZXIgQ0EgLSBHNDAeFw0xMzEyMjMwMDAwMDBa
Fw0xNTAxMTYyMzU5NTlaMIHOMS4wLAYDVQQDDCVQZXJzb25hIE5vdCBWYWxpZGF0ZWQgLSAx
MzU4Mjc2MTA4NjMxMSswKQYJKoZIhvcNAQkBFhxqYWx0bWFuQHlvdXItZmlsZS1zeXN0ZW0u
Y29tMQ8wDQYDVQQLDAZTL01JTUUxHjAcBgNVBAsMFVBlcnNvbmEgTm90IFZhbGlkYXRlZDEf
MB0GA1UECwwWU3ltYW50ZWMgVHJ1c3QgTmV0d29yazEdMBsGA1UECgwUU3ltYW50ZWMgQ29y
cG9yYXRpb24wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDSrqYUguvbguthxGNq
M15noPYGLMnpjRKT2VS88MxNAZ7RaplB8Azrk8vOH+q+IWnXCrap+BevY27PZW6UgNAPcETG
FTi/qdYAukHwnCV7fvjXXJEOw3jg+eK/06bhr0uThvmrjT+jWHlpzK3mSDPtEBSkgXDbLkL/
LQfYvay0Ia7n65l5Ry4zHlrg6uJ+UqvWJZwXazXjo2H4EksGCM4nrKHTeVoj5oSquvqs3tSf
BytXLGVqSOHqjXb+lri1gtlovX7AjMT2gdONRrjR3wun6tjHvoqjUNZ2mUs0XXh0vI0GyTKd
taz26xY+iKboxFO2atDbb1Gm8KdUXqO/UivlAgMBAAGjggLVMIIC0TAMBgNVHRMBAf8EAjAA
MA4GA1UdDwEB/wQEAwIFoDAgBgNVHSUBAf8EFjAUBggrBgEFBQcDBAYIKwYBBQUHAwIwHQYD
VR0OBBYEFMC1SMuRefXyNAwkZM+Lgu7iJ6nFMCcGA1UdEQQgMB6BHGphbHRtYW5AeW91ci1m
aWxlLXN5c3RlbS5jb20wHwYDVR0jBBgwFoAUrfnDk3IttbkoYeSk12DVxApeGgEwggErBggr
BgEFBQcBAQSCAR0wggEZMIIBFQYIKwYBBQUHMAKGggEHbGRhcDovL2RpcmVjdG9yeS52ZXJp
c2lnbi5jb20vQ04lMjAlM0QlMjBTeW1hbnRlYyUyMENsYXNzJTIwMSUyMEluZGl2aWR1YWwl
MjBTdWJzY3JpYmVyJTIwQ0ElMjAtJTIwRzQlMkMlMjBPVSUyMCUzRCUyMFBlcnNvbmElMjBO
b3QlMjBWYWxpZGF0ZWQlMkMlMjBPVSUyMCUzRCUyMFN5bWFudGVjJTIwVHJ1c3QlMjBOZXR3
b3JrJTJDJTIwTyUyMCUzRCUyMFN5bWFudGVjJTIwQ29ycG9yYXRpb24lMkMlMjBDJTIwJTNE
JTIwVVM/Y0FDZXJ0aWZpY2F0ZTtiaW5hcnkwXQYDVR0fBFYwVDBSoFCgToZMaHR0cDovL3Br
aS1jcmwuc3ltYXV0aC5jb20vY2FfNTYxYzEwMzY5MGM5N2E2OTI0N2EwZWYwNzFhYzgxYWYv
TGF0ZXN0Q1JMLmNybDBsBgNVHSAEZTBjMGEGC2CGSAGG+EUBBxcBMFIwJgYIKwYBBQUHAgEW
Gmh0dHA6Ly93d3cuc3ltYXV0aC5jb20vY3BzMCgGCCsGAQUFBwICMBwaGmh0dHA6Ly93d3cu
c3ltYXV0aC5jb20vcnBhMCoGCmCGSAGG+EUBEAMEHDAaBhFghkgBhvhFARABAgIEAYazFxYF
MTA5MjIwDQYJKoZIhvcNAQEFBQADggEBAEUyacJvoRfQdglYgnUwaTMsRRg0YeAljbnb8M5E
vBSo3u/LhvbXtvu+9uE8R6UOE4GvKH382I27vjuM28oHqfii04URAB1icmA8b7rxYQo9Ob2I
/NkkQRBwbA3HGLWXFjupODWbP5WylyySAAI7HxG2xbE4X+8+hMJVOKfxJb6J0SUOBlnmMkmg
nAxgOM4venSmli6U3o0nADHNLZEJjqym2QstkeYPhDZ6sSO3t/yv+JyYbfb01hiOdhGsDBif
oPTqcWRvA+lqbWMHJG3p9uL/kI4jbLj9/ZkMfdRDHpQNVAuGxxyj7b1pxM0jBuTP0Jmrcz3U
wUwT5kjCCDt2gGAxggRSMIIETgIBATCBuzCBpjELMAkGA1UEBhMCVVMxHTAbBgNVBAoTFFN5
bWFudGVjIENvcnBvcmF0aW9uMR8wHQYDVQQLExZTeW1hbnRlYyBUcnVzdCBOZXR3b3JrMR4w
HAYDVQQLExVQZXJzb25hIE5vdCBWYWxpZGF0ZWQxNzA1BgNVBAMTLlN5bWFudGVjIENsYXNz
IDEgSW5kaXZpZHVhbCBTdWJzY3JpYmVyIENBIC0gRzQCEHiwwaDblI88SsZkARSRB5MwCQYF
Kw4DAhoFAKCCAmswGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcN
MTQwOTA0MTIwNzEzWjAjBgkqhkiG9w0BCQQxFgQUYTYYrWGf97ZCwfBHWAyosr89u7YwbAYJ
KoZIhvcNAQkPMV8wXTALBglghkgBZQMEASowCwYJYIZIAWUDBAECMAoGCCqGSIb3DQMHMA4G
CCqGSIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG9w0DAgIBKDCB
zAYJKwYBBAGCNxAEMYG+MIG7MIGmMQswCQYDVQQGEwJVUzEdMBsGA1UEChMUU3ltYW50ZWMg
Q29ycG9yYXRpb24xHzAdBgNVBAsTFlN5bWFudGVjIFRydXN0IE5ldHdvcmsxHjAcBgNVBAsT
FVBlcnNvbmEgTm90IFZhbGlkYXRlZDE3MDUGA1UEAxMuU3ltYW50ZWMgQ2xhc3MgMSBJbmRp
dmlkdWFsIFN1YnNjcmliZXIgQ0EgLSBHNAIQeLDBoNuUjzxKxmQBFJEHkzCBzgYLKoZIhvcN
AQkQAgsxgb6ggbswgaYxCzAJBgNVBAYTAlVTMR0wGwYDVQQKExRTeW1hbnRlYyBDb3Jwb3Jh
dGlvbjEfMB0GA1UECxMWU3ltYW50ZWMgVHJ1c3QgTmV0d29yazEeMBwGA1UECxMVUGVyc29u
YSBOb3QgVmFsaWRhdGVkMTcwNQYDVQQDEy5TeW1hbnRlYyBDbGFzcyAxIEluZGl2aWR1YWwg
U3Vic2NyaWJlciBDQSAtIEc0AhB4sMGg25SPPErGZAEUkQeTMA0GCSqGSIb3DQEBAQUABIIB
AI3OUsXpT7oAk5uNLLDurjQ4C+HI7gt+Wa0RkGzpredaL0yljPNpGSYU8qiaZqt5IU4schW7
pjva55X+ehEDCs420t+/wrBvveFgt9F1wjT7+2nMQltKCrOM3CJcnxlSNCR1myT3FksaHcDs
b7RCiFQooEKtBL4Qre18+ZoY6ePPkDkZM3yjrD3EVoTM3Lsvg6vIZ5mLGE7qUGK1Eny/Qn85
ZeIw9QcY0lGLrWmZqbLQ1EPO0iSpd2lN+1X6xh4Zs6vW8LZulV4/d8uCxH+Z/f9zAeJAgVCz
beTiYvZDeBkAe+JM+RYOvbrNw00c4ELhPkA+koHF+mBeep6kJXv1+CEAAAAAAAA=
--------------ms050508090200020105070401--