[OpenAFS] Re: Advice on a use case

Andrew Deason adeason@sinenomine.net
Fri, 9 Nov 2012 11:45:09 -0600

On Thu, 8 Nov 2012 22:48:56 -0800
Timothy Balcer <timothy@telmate.com> wrote:

> Well, unless I am missing something seriously obvious, for example it
> took 1.5 hours to rsync a subdirectory to an AFS volume that had not a
> lot of content, but many directories.

Creating lots of files is not fast. Due to the consistency guarantees of
AFS, you have to wait for at least a network RTT for every single file
you create. That is, a mkdir() call is going to take at least 50ms if
the server is 50ms away. Most/all recursive copying tools will wait for
that mkdir() to complete before doing anything else, so it's slow.

Arguably we could maybe introduce something to 'fs storebehind' to do
these operations asynchronously to the fileserver, but that has issues
(as mentioned in the 'fs storebehind' manpage). And, well, it doesn't
exist right now anyway, so that doesn't help you :)

What can possibly make this faster, then, is copying/creating
files/directories in parallel. I'm not sure if there exists any tool do
copy files like that, but you could possibly script it or something. Or,
if the data is organized in such a way that you can run one rsync/'cp
-R'/etc per top-level directory, that could make it faster. That is, if
you have 4 top-level directories, running 4 recursive copies in parallel
could possibly make the whole thing faster.

(You can go higher than 4 transfers in parallel if you so some fiddling,
but I'm not going too deeply into that... let me know if you want to
know more.)

Also, I was assuming you're rsync'ing to an empty destination in AFS;
that is, just using rsync to copy stuff around. If you're actually
trying to synchronize a dir tree in AFS that's at least partially
populated, see Jeff's comments about stat caches and stuff.

> No, I am writing from a local audio/video server to a local repo,
> which needs to be very fast in order to service live streaming in
> parallel with write on a case by case basis.

It seems like it could just write to /foo during the stream capture, and
copy it to /afs/bar/baz when it's done. But if the union mount scheme
makes it easier for you, then okay :)

But I'm not sure I understand... above you discuss making these
directory trees made up of a lot of directories or relatively small
files. I would've thought that video captures of a live stream would not
be particularly small... copying video to AFS sounds more like the
"small number of large files" use case, which is much more manageable.
Is this a lot of small video files or something?

> > To improve things, you can maybe try to reduce the number of volumes
> > that are changing. That is, if you are adding new data in batches, I
> > don't know if it's feasible for you to add that 'batch' of data by
> > creating a new volume instead of writing to existing volumes.
> That's feasible..... but what if, for example, vol1 is mounted at *
> /afs/foo/home/bar* and contains a thousand directories. The new
> content is a thousand more directories, but at the exact same level of
> the tree. How would I handle that? As far as I can tell, OpenAFS only
> allows a volume being mounted on its very own directory, and you can't
> nest them together like that.

Well, that's what I meant by "I don't know if it's feasible". If you
must add stuff to the same level of the dir hierarchy, instead of
putting it all under a new directory e.g. "foo_2012-11-09/", it's
harder. But, if you can create a new vol for each dir as you mention:

> How unfeasible would it be to create N volumes, where N >= 500 per
> shot? I would end up with many thousands of tiny volumes.. none of
> which I have trouble with, but would that be scalable? Let's assume I
> have spread out db and file servers in such a way to equalize load.

I'm not sure what scalability issues here you're expecting; making
volumes smaller but more in number is typically something you do to
improve scalability. We usually encourage more small volumes instead of
fewer big volumes.

What I would guess you may run into:

 - The speed of creating the volumes. I'm not actually sure how fast
   this goes, since creating a lot of volumes quickly isn't usually a
   concern... so you'll have to try it :)
 - Fileserver startup/shutdown time for non-DAFS is somewhat heavily
   influenced by the number of volumes on the server this is a
   significant issue when you start to have tens or hundreds of
   thousands of volumes on a server.

That second point is addressed by DAFS, which can handle at least a
million or so volumes per server rather quickly (a few seconds for
startup). I'm not sure if you know what DAFS is, but converting to using
it should be straightforward. There is a section about DAFS and how to
convert to using it in appendix C of the Quick Start Guide:

Andrew Deason