[OpenAFS] afs.GCPAGs in current releases under Linux (RHEL4/5)

Eric.Hagberg@morganstanley.com Eric.Hagberg@morganstanley.com
Thu, 4 Mar 2010 20:20:24 -0500 (EST)


I've found that if you run a program to generate tokens and pags 
frequently (about once per second), that fairly soon, the cpu system time 
on the machine will begin to swallow performance, though it takes a little 
while to observe it... but if you do that long enough, the machine will 
eventually grind to a halt. I found that this behavior started between 
openafs 1.4.1 and 1.4.2, where keyring support got enabled. Some 
experimentation has shown that the problem is related to the effective 
disabling of pag garbage collection when keyring support is compiled in.

Interestingly, just changing the bit of code to allow openafs w/ keyring 
support to do pag GC makes the problem go away, in that you don't get 
system time spikes/growing forever while afs.GCPAGs=1, but switching to 
afs.GCPAGs=0 makes the problem come back. So something about keyrings 
isn't really doing everything it should be if pag GC can make things 
better.

That patch is just:

--- src/afs/afs_osi.c.orig      2010-03-01 19:54:52.000000000 -0500
+++ src/afs/afs_osi.c   2010-03-01 19:55:00.000000000 -0500
@@ -841,7 +841,6 @@
  void
  afs_osi_TraverseProcTable()
  {
-#if !defined(LINUX_KEYRING_SUPPORT)
      struct task_struct *p;

  #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,18) && defined(EXPORTED_TASKLIST_LOCK)
@@ -888,7 +888,6 @@
  #endif /* EXPORTED_TASKLIST_LOCK && LINUX_VERSION_CODE < KERNEL_VERSION(2,6,18) */
         rcu_read_unlock();
  #endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,16) */
-#endif
  }
  #endif

Maybe this isn't the best fix, but it definitely points out a problem.

(I also noticed that compilation of 1.4.12pre{3,4} breaks due to what appears 
to be a misapplied patch, where "crfee" is present in the code, but probably is 
supposed to be "crfree")