[reiserfs-list] Re: [OpenAFS-devel] more on the 2.2.18pre17 SMPcpu hog/etc.

Andi Kleen ak@suse.de
Tue, 5 Dec 2000 00:34:37 +0100

On Mon, Dec 04, 2000 at 06:11:40PM -0500, Derek Atkins wrote:
> Harald Barth <haba@pdc.kth.se> writes:
> > > Unfortunately, the AFS cache does not meet those conditions.  The cache is
> > > a single directory that may have tens of thousands of files in it.  
> > 
> > This is a design miss of the AFS cache which will give problems for
> > almost all flavours of *i*x file systems. It can only win by a change.
> > A change can go in two directions: Order the files in some directory
> > tree with some smart structure or give up the files approach and use
> > one big file. When using -memcache, the whole cache is one big chunk
> > of data, too. Another pain is this bad habit of creating all the
> > inodes at boot. If I start with a really small cache and later
> > increase the cache size on the fly, inodes are created without
> > blocking cache operations for minutes. So why the wait at boot? This
> > feature seems to be wanted by more than a few.
> I'm not convinced this is a design miss.. The cache manager needs to
> walk the cache to see what's in it, and then it can random-access into
> the cache at a later time.  This implies that it has to walk the
> directory at the beginning, so it doesn't matter how long it takes to
> walk it.  Subsequenty, it caches the inode number so it doesn't need
> to go through the directory for future accesses.  So, indeed, I think
> the current design is actually very clever.  The problem is that
> reiserfs doesn't use the inode (by itself) to key into the filesystem
> like pretty much every other FS in existence.

There are actually other FS that need >32bit of inode for a lookup too,
e.g. XFS with file systems >2TB or a NFSv3 client (not that it would make
sense to run a cache dir on a NFSv3 client, but it exists) 

reiserfs is one of the file systems that does not care about big directories,
it was designed for them.  The cache files have regular names, so if
everything else fail you could just compute a unique number from the 
cache file name in user space, pass it in as inode and convert it back to the
filename and open it using filp_open(). 

looking at osi_UFSOpen() it looks like it could be implemented without too
much pain (assuming there is a way to get path name of the cachedir from
the kernel). Only two places would need to know about this -- osi_UFSOpen()
and the code in the user space doing the stats. At least in osi_UFSOpen() it
would even be cleaner than the hand hacking of the file structure that is
currently done @)

There still is no way to find out the cache dir is on reiserfs, that would
need some user option or user space could would need to look into /etc/fstab.

Actually with the efficient dcache in 2.2+ I suspect that method could be used 
unconditionally on Linux for all file systems without much if any slowdown.
As long as the dentries stay cached there is no directory walking overhead
even on filesystem with not-so-well scaling directories like ext2.

BTW, do you have plans to move to an external vnode for AFS? the 
copied inode structure looks very fragile and has e.g. caused problems
with the suse kernel (which has two fields more than a normal linux 2.2