[OpenAFS-devel] Windows cache manager code

Lantzer, Ryan lantzer@umr.edu
Thu, 12 Dec 2002 16:00:22 -0600


There seem to be some performance issues with the Windows cache manager
code in OpenAFS, at least on Windows XP.

With a status cache size of 10000 and a directory tree (which crosses
multiple volumes) with 185000 files and directories, OpenAFS 1.2.5 on
Windows XP can easily be made to run very poorly. I wrote a perl script
to print out all of the files and directories underneath the parent of
all of these files and directories. It should be doing nothing more to
each file and directory other than checking to see if it is a file or
a directory. For the first ~10000 directory entries, the script
progresses reasonably and seems to be comparable to a somewhat recent
CVS build of OpenAFS on RedHat Linux. After more than 10000 have been
parsed, it starts slowing down pretty quickly, running at a snail's
pace compared to running the same script on OpenAFS under Linux. I
tried adding some logging to the status cache code on the Windows
machine (src/WINNT/afsd/cm_scache.c), and it looks like the
cm_GetNewSCache() function starts taking longer and longer to reuse
existing status cache entries. It looks like the function has to
traverse further and further through cm_scacheLRULastp (I assume that
this is some kind of Least Recently Used list) before it can find a
status cache entry with a reference count of 0 so that it can reuse
that existing status cache entry. On every single call to
cm_GetNewSCache(), it skips over a number of status cache entries which
increases decidedly more than it decreases. It reaches a count in the
hundreds pretty quickly and will steadily follow an increasing pattern
to well over 2000 on each call to cm_GetNewScache(). I did not have the
patience to let it run through all 185000 files and directories to
determine the maximum number of status cache entries it will skip on
each call. When the script gets to the point where it is obviously
running slower than when it started, the AFS client service is taking
up ~100% of the CPU. I have not tried running the AFS client
through a debugger to see if cm_GetNewSCache() is the only thing taking
up a lot of CPU, but this seems like a good place to start looking for
a problem with the cache manager code on Windows.

I was hoping to try keeping track of the increments and decrements to
the reference counts on the various status cache entries, but the
source code did not seem to lend itself to doing this sort of thing.
Does anyone have an idea of how the AFS Windows cache manager code is
supposed to handle the reference count on status cache entries and/or
how status cache entries are maintained in the LRU list?

I also realized that the Windows cache manager code appears to be
completely separated from the mainstream cache manager code. Does
anyone have any thoughts about whether or not it would be a good idea
to try patching the Windows AFS client to use the mainstream cache
manager code? I realize that the Windows client has a very different
data cache (and possibly many more differences), but right now there
doesn't seem to be much work being done on improving the Windows cache
manager code.

Ryan Lantzer