[OpenAFS-devel] Large Caches: Implementation Discussion

Derek Atkins warlord@MIT.EDU
26 Jul 2001 20:19:29 -0400


Nathan Neulinger <nneul@umr.edu> writes:

> That also takes care of the problem with changing number of files?

Yes.  When you change files_per_subdir you will NOT lose cache files.
Cache files will get moved around, as necessary.  If you change the
number of cache files, well, then you _might_ lose cache files, but
that was already true.  Many thanks to Jim Rees for talking me through
the algorithms to do this.

<OVER_DETAILED_EXPLANATION>

(Feel free to skip this is you don't care about an in-depth
description of how the files are handled in the directories.)

One thing to note is that if you increase the number of files_per_dir
(fpd) without changing the number of cache files (cf) in such a way
that the number of directories does not change, the code will NOT
automatically rebalance the directories.  For example, if you have fpd
= 3 and cf = 9, you will get a hierarchy that looks like:

D0/   V0, V1, V2
D1/   V3, V4, V5
D2/   V6, V7, V8

If you change fpd to 4 without changing cf, the system wont make any
changes to this hierarchy.  The reasoning is subtle: the algorithm
only moves files around when a directory has too many files, or if a
directory is going away.  In this particular example, neither case
holds (because you still need three directories, and no directory has
more than 4 entries), so it does nothing.

Another thing to note is that new files are created in the lowest
directory that has space.  So, if you ran 'rm D1/V5' to delete that
cache file and then ran afsd again with fpd=4/cf=9, the system would
put V5 in D0 (since it's the lowest directory with extra space), but
would not change the contents of D1 and D2.

If you change cf to 8, it would then rebalance (since it only needs
two directories).

</OVER_DETAILED_EXPLANATION>

> > I have tested my patch, and it takes under 5 minutes to build a
> > 3-million-file cache hierarchy (for "3GB" of data) using the default
> > settings of 2048 files per directory.  This implies 147
> > subdirectories.
> 
> 1k chunks? Have you seen better cache performance with the smaller
> chunks? Or did you just change the algorithm for number of total files
> as part of this large cache patch?

Sorry, I'm off by an order of magnitude there.  I use the standard
chunksize (3GB cache creates 300K cachefiles).  Sorry.  My math was
wrong (it's late).  All I did was change the way cachefiles are stored
on disk so that it doesn't take forever to create 300,000 files.

> > Anyways, I think I've got all the issues worked out, now.  Does anyone
> > else have any input before I submit my patches?
> 
> Not really, other than that you might as well wait, since Derrick is
> going to be gone for a few days still (I think next thursday, but he
> might have meant today). I'd send it to the list so some more people can
> test it... 

I just want to make sure there aren't other issues with it before I
send it out.  But I will send it out EoD tomorrow unless someone else
raises a concern.  That way others can play over the weekend while *I*
go away :)

> -- Nathan

-derek

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord@MIT.EDU                        PGP key available