[OpenAFS] afs.GCPAGs in current releases under Linux (RHEL4/5)
Eric.Hagberg@morganstanley.com
Eric.Hagberg@morganstanley.com
Thu, 4 Mar 2010 20:20:24 -0500 (EST)
I've found that if you run a program to generate tokens and pags
frequently (about once per second), that fairly soon, the cpu system time
on the machine will begin to swallow performance, though it takes a little
while to observe it... but if you do that long enough, the machine will
eventually grind to a halt. I found that this behavior started between
openafs 1.4.1 and 1.4.2, where keyring support got enabled. Some
experimentation has shown that the problem is related to the effective
disabling of pag garbage collection when keyring support is compiled in.
Interestingly, just changing the bit of code to allow openafs w/ keyring
support to do pag GC makes the problem go away, in that you don't get
system time spikes/growing forever while afs.GCPAGs=1, but switching to
afs.GCPAGs=0 makes the problem come back. So something about keyrings
isn't really doing everything it should be if pag GC can make things
better.
That patch is just:
--- src/afs/afs_osi.c.orig 2010-03-01 19:54:52.000000000 -0500
+++ src/afs/afs_osi.c 2010-03-01 19:55:00.000000000 -0500
@@ -841,7 +841,6 @@
void
afs_osi_TraverseProcTable()
{
-#if !defined(LINUX_KEYRING_SUPPORT)
struct task_struct *p;
#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,18) && defined(EXPORTED_TASKLIST_LOCK)
@@ -888,7 +888,6 @@
#endif /* EXPORTED_TASKLIST_LOCK && LINUX_VERSION_CODE < KERNEL_VERSION(2,6,18) */
rcu_read_unlock();
#endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,16) */
-#endif
}
#endif
Maybe this isn't the best fix, but it definitely points out a problem.
(I also noticed that compilation of 1.4.12pre{3,4} breaks due to what appears
to be a misapplied patch, where "crfee" is present in the code, but probably is
supposed to be "crfree")