[OpenAFS] Performance issues with Git repositories (or in general
with many small files workloads)
Douglas E Engert
deengert@gmail.com
Wed, 16 Dec 2020 17:50:07 -0600
If you are just backing up, consider "git bundle" that creates one file,
and git clone can read the bundle.
https://stackoverflow.com/questions/5578270/fully-backup-a-git-repo
On 12/16/2020 4:21 PM, Ciprian Dorin Craciun wrote:
> Hello all!
>
> I'm trying to use AFS to backup various Git repositories. By "backup"
> actually mean `git push --mirror /afs/.cell/some-path/repository.git`,
> which has the following behaviour: it writes many small files in the
> `.git/objects` folder fanned by the first two hex digits of the object
> hash.
>
> In fact this pattern can be found in many applications that handle
> lots of small files. For example `rsync`, build systems, etc.
> Moreover the pattern I'm describing is single-threaded, as in these
> files are not created concurrently by multiple threads / processes.
>
> Unfortunately the performance is abysmal, I mean what should take
> perhaps 1-2 seconds on a normal drive it takes perhaps up-to a minute
> on AFS; for example `git-push` reports an bandwidth of only ~20
> KiB/s.
>
> Looking at the CPU usage, the `dafileserver` seems to be at ~95%,
> although the system has 4 cores and is lightly used.
>
> I can eliminate the following causes:
> * network issues (both bandwidth or latency), because this behaviour
> occurs even if I mount AFS on the same server where the file server
> lives, thus everything happens over loopback;
> * encryption -- it is off;
> * synchronous close -- I've tried to set `fs storebehind -allfiles
> 16384 -verbose`;
> * disks backing AFS cache -- it's a NVMe disk capable of ~3GiB/s;
> * disks backing AFS file server -- it's a RAID5 of 3 top-of-the-line
> (Gold) WD S-ATA drives;
> * I can achieve good throughput for large files, or if accessing
> medium sized files from multiple threads / processes;
>
> My OpenAFS deployment is on Linux 5.3.18, OpenSUSE Leap 15.2, and the
> following are the arguments of the file server and cache manager:
>
> ~~~~
> /usr/lib/openafs/dafileserver -syslog -sync onclose \
> -p 128 -b 524288 -l 524288 -s 1048576 -vc 4096 \
> -cb 1048576 -vhandle-max-cachesize 32768 \
> -udpsize 67108864 -sendsize 67108864 \
> -rxpck 4096 -rxmaxmtu 1400 -busyat 65536
> ~~~~
>
> ~~~~
> /usr/sbin/afsd -blocks 67108864 -chunksize 17 -files 524288 \
> -files_per_subdir 4096 -dcache 524288 \
> -stat 524288 -volumes 4096 \
> -splitcache 90/10 \
> -afsdb -dynroot-sparse -fakestat-all \
> -inumcalc md5 -backuptree \
> -daemons 8 -rxmaxfrags 8 -rxmaxmtu 1400 \
> -rxpck 4096 -nosettime
> ~~~~
>
> BTW, initially I was using the old `fileserver`-based setup, and
> even though I've switched to `dafileserver` the performance seems to
> stay unchanged.
>
> Thanks for the help,
> Ciprian.
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>
--
Douglas E. Engert <DEEngert@gmail.com>