[OpenAFS-devel] Re: afscache on UFS+logging or ZFS?

Tom Keiser tkeiser@sinenomine.net
Thu, 5 Aug 2010 11:55:20 -0400


On Wed, Aug 4, 2010 at 6:50 PM, Robert Milkowski <milek@task.gda.pl> wrote:
> On 03/08/2010 23:28, Andrew Deason wrote:
>>
>> On Tue, 03 Aug 2010 22:36:31 +0100
>> Robert Milkowski<milek@task.gda.pl> =A0wrote:
>>
>>
>>>
>>> Hi,
>>>
>>
>> Just by the way, these kinds of questions are more suited to
>> openafs-info than openafs-devel, I think.
>>
>>
>
> I think you right. Sorry about that.
>
>
>>> Can AFS cache be placed on any local filesystem like ZFS, VxFS or
>>> UFS+logging?
>>>
>>
>> ZFS and UFS have been used (with and without logging). I'm not sure if
>> anyone has tried to use VxFS, but in theory I think it should work as
>> our cache I/O mechanisms are supposed to be FS-agnostic.
>>
>> Note that there is a known (unfixable) issue with ZFS caches that can
>> cause them to take up far more disk space than you have configured.
>>
>> <http://www.openafs.org/pipermail/openafs-devel/2009-September/017033.ht=
ml>
>> has some information. Decreasing the ZFS recordsize makes it not as bad,
>> though the issue is still always there.
>>
>>
>
> This should't be a big issue. You can always set recordsize to something

Except that it is a big issue because the cache manager tries hard to
internally keep track of cache filesystem usage rather than constantly
calling out to the VFS layer.  This technique works fine with
old-school filesystems where most of the data structures have high
degrees of locality and low amounts of indirection.  Unfortunately,
our estimation algorithm completely breaks down when you're dealing
with ZFS due to a multitude of issues (we assume truncate
instantaneously frees storage; variable blkptr_t DVA wideness; ZAP
versus old-school inode and directory tables; log-structured space
maps rather than a static bitmap; gang blocks; the issue Mattias
addressed; the previously mentioned truncation issue, modulo known
workarounds; I also seem to recall that complicated interplay between
writes/truncates, uberblock rotatation, and space maps was a potential
issue; when the cache ZFS filesystem is on its own pool of vdevs,
there's also the issue of accounting for MOS metadata; and when there
are multiple DSL object sets in play that can also complicate
matters).  In essence, the tricks the CM plays to avoid calling out to
the VFS layer don't work very well on ZFS...

-Tom