[reiserfs-list] Re: [OpenAFS-devel] more on the 2.2.18pre17 SMPcpu hog/etc.

Derek Atkins warlord@MIT.EDU
03 Dec 2000 18:28:10 -0500


Hans Reiser <reiser@namesys.com> writes:

> The AFS code needs to store the key (at least 64 bits of it), not
> the inode number, or performance is doomed when interacting with
> reiserfs.

By the time that the AFS kernel code gets far enough to even obtain
the "key", it's already opened the file.  Unless there is a portable
way to obtain the key from user-space (considering that we cannot
presume that we're using Reiserfs).

Basically, what happens is that at start-time the user-space code
finds the inode number as I mentioned in the previous mail.  Then it
passes that to the kernel, and the kernel immediately opens the file
and pretty much keeps the file opened.  So, as long as the file inode
is cached between the time the user-space code stat's the file and the
kernel opens it, we're fine.

> Chris Mason wrote:

> Good point, isn't an easy way to get it in userspace right now.
>
> > Chris, would a stat() be sufficient to bring inode into the cache and
> > bypass the slow lookup?  (I'm going to assume that there is enough
> > kernel memory to cache all the cacheinodes).
> >
> 
> It will probably work.  But under load, there is still a window for the
> inode to leave the cache.  If userspace had the file open while the iget
> was going on, you would know for sure the inode was in cache.

All this happens at AFS start-time (which usually implies boot time).
I find it unlikely that a machine would be that loaded at boot-time
that stating that number of files would fill the inode cache.  I
suppose I could change the user-space code to perform the stat just
before it passes the inode to the kernel.  But I don't really believe
that will be necessary.

Consider that struct inode is certainly smaller than 4k, so, to stat
35000 files you need less than 140k of ram (to store the inode cache).
Keep in mind that the AFS kernel is going to require at least a couple
of MEG of ram in the first place, so I don't think that you'll run out
of in-memory inode data.  Also, this only has to happen once, and the
inode cache is certainly LRU.

I'm sure you could come up with a degernerate case, but I think doing
it this way will in most cases be Good Enough (TM).

> If you could pass the filename to the kernel, and have the kernel use the
> name, it would probably be cleaner.  That would work on all the new
> filesystems (GFS and XFS can use 64 bit inode numbers as well).

Currently there is no interface to pass the cache-file filename to the
kernel, so we can't do that.  There is an interface to pass down a
64-bit inode number (although it's only used on the SGI), so I suspect
that we can eventually move over to using that.  However, keep in mind
that i_ino (in struct inode) is only an unsigned long (at least in
2.2).  Once that changes (and d_ino in struct dirent), we can consider
moving to 64-bit inode numbers.

Unfortunately AFS needs to continue to work on the dozens of platforms
(and file systems) that are already supported.  Adding something this
linux-specific (all other platforms deal fine with lookups by inode
number) would probably not be accepted by the AFS Maintainers.  I'd
rather fix Linux to have a more consistent interface across
filesystems.

> -chris

-derek

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/      PP-ASEL      N1NWH
       warlord@MIT.EDU                        PGP key available