[OpenAFS] Windows cache rehashed...

Rodney M Dyer rmdyer@uncc.edu
Thu, 18 Dec 2003 18:53:27 -0500


Jeffrey and others,

Today I've found a way to easily reproduce the bug in the AFS Windows cache 
manager.  It shows up rather easily as a leak in the handle 
management.  The number of handles rises out of control as files are being 
copied from AFS to the local disk.  After the number of handles has risen 
beyond what is expected, if you run an application from AFS, then the 
startup time will take much longer than normal.  For example, our ProE 
application starts up in 40 seconds avg. starting with an empty 8192 Meg 
cache, but after the bug is reproduced, the time climbs to over 2 minutes.

To reproduce the problem, use the following settings...

      Windows XP SP1, 1 Gig RAM, P4 3.0 Gig, 100 MBit connectivity
      OpenAFS 1.2.10
      Cache size:  8192
      Chunk size:  32K
      Status Entries:  1000
      Background Threads: 6
      Service Threads: 8

*  Make a temporary directory to copy some files to...

      c:\>mkdir "c:\temp\test"

*  Change into the temporary folder...

      c:\>cd "c:\temp\test"

*  Make sure you start with a fresh cache...

      c:\>net stop "IBM AFS Client"
      c:\>del "c:\afscache"

           Note:  It may take some time here before the AFS service let's 
go of the cache, keep trying the delete until the file is gone.  (I'm not 
sure why it takes so long sometimes for AFS to shutdown.  It's probably the 
same problem that manifests the handle leak.)

      c:\>net start "IBM AFS Client"


*  Now bring up the task manager and select the columns for 
"afsd_service.exe" handles, etc., using the view->select columns menu.

*  Now, in the default temporary directory at the command prompt, start a 
recursive copy of a large tree of files out of your cells AFS space...

      c:\temp\test>xcopy 
"\\%computername%-afs\all\cell\dir1\dir2\dir3..."  /s /e /f /c

      The "/s /e /f /c" means...all subdirectories, even empty ones, show 
the files as they are being copied, and continue on errors.

      Any directory will do.  You may need to copy a large number of files 
and/or some big files.  I just started the copy on a very large tree and 
let it go.  When the handles started rising, I just pressed CTRL+C, or 
CTRL+Break.  Depending on your AFS permissions, you may need a token to do 
the copy.  Make sure the size of the files being copied are plenty larger 
than the cache size of 8192 Meg.

Now, if you watch the Task Manager's "afsd_service.exe" handle count it 
will start out ok, but soon rise out of control.  Stopping the copy has no 
effect of reducing the handles.

That is about it.

Rodney