[OpenAFS] Windows cache rehashed...
Rodney M Dyer
rmdyer@uncc.edu
Fri, 19 Dec 2003 15:11:31 -0500
Jeffrey,
I suppose I must apologize for sending any incorrect data in my original
post on 12/1. At that point in my problem diagnosis I thought it was my
physical RAM that was being used up causing lots of paging to occur. I was
using a machine with only 256 Meg physical. Based on your statements that
followed about the AFS cache implementation, I changed to using a smaller
cache. You would think this would have made the original problem of slow
starting apps totally disappear, but it did not. The machine certainly is
more responsive, due to less, or no paging going on, but the AFS cache
still seems to degrade application startup times. This was proven once we
started testing on a machine with a Gig of RAM where the Windows cache
didn't even enter into the equation. So I'm now working on another
problem, which definitely seems to be a bug in the AFS cache manager.
Ok, I've tried to simplify this as much as possible. My previous email
documents the exact method to produce the bug and only takes about 5
minutes to reproduce. You should see the same symptoms at your site. You
should be able to watch the handle count rise well above 256 handles. You
should be able to obtain the same results as I, more easily than I can
gather it all into a detailed report for you (see data at bottom). If you
are not seeing the same results, just let me know, I would be curious why.
Here is the method again, using correct units, with added text for clarity...
To reproduce the problem, use the following settings...
Windows XP SP1, 1 Gig RAM, P4 3.0 Gig, 100 MBit connectivity
OpenAFS 1.2.10
Cache size: 8192K ( 8 Meg cache )
Note: For those who need exacting definitions, this is an 8 Meg
cache, not 32 Meg, not 256 Meg, not 8 Gig...just a simple 8 Meg
cache. Units are checked. Based on your information, the current Windows
AFS cache implementation should handle this cache size easily without problems.
Chunk size: 32K
Status Entries: 1000
Background Threads: 6
Service Threads: 8
1. Make a temporary local directory to copy some files to...
c:\>mkdir "c:\temp\test"
2. Change into the temporary folder...
c:\>cd "c:\temp\test"
3. Make sure you start with a fresh cache...
c:\>net stop "IBM AFS Client"
c:\>del "c:\afscache"
Note: It may take some time here before the AFS service let's
go of the cache, keep trying the delete until the file is gone. (I'm not
sure why it takes so long sometimes for AFS to shutdown. Its probably the
same problem that manifests the handle leak.)
c:\>net start "IBM AFS Client"
4. Now bring up the task manager and select the columns for
"afsd_service.exe" handles, etc., using the view->select columns menu.
5. Now, in the default temporary directory at the command prompt, start a
recursive copy of a large tree of files out of your cells AFS space. It
doesn't matter what files...any files will do.
c:\temp\test>xcopy "\\%computername%-afs\all\your-cell\dir1..." /s
/e /f /c
The "/s /e /f /c" means...all subdirectories, even empty ones, show
the files as they are being copied, and continue on errors.
Again, any files will do. You may need to copy a large number of
files and/or some big files. At our site I just started the copy on a very
large tree and let it go. For example, the following should work fine...
c:\temp\test>xcopy
"\\%computername%-afs\all\your-cell-name-here\*.*" /s /e /f /c
(Make sure you don't have any symbolic links in AFS that might create
a recursive loop in whatever tree of files you are copying. The xopy.exe
program will follow them if you do.)
As the copy is progressing, as the handles start rising, keep
watching. After the count of handles rises into the thousands, I just
pressed CTRL+C, or CTRL+Break. Depending on your AFS permissions, you may
need a token to do the copy. Make sure the size of the files being copied
are plenty larger than the cache size of 8192K.
Now, if you watch the Task Manager's "afsd_service.exe" handle count it
will start out ok, but soon rise out of control. Stopping the copy has no
effect of reducing the handles.
Using the above method I was able to easily obtain the following numbers...
Using the above config of 8 Meg cache with 32K chunks.
After about 987 Meg copied from AFS to the local "c:\temp\test" folder.
http://www.coe.uncc.edu/~rmdyer/test_8MB_afscache.jpg
Here's another, same senario, just using the AFS client defaults for
cache and chunk...
http://www.coe.uncc.edu/~rmdyer/test_32MB_afscache.jpg
Is this enough information? When you say..."Please add this data to the
Request (#2628)". How do I do this?
Happy Holidays! Sorry to be such a problem (an ass).
Rodney