[OpenAFS] weird memory problem on i386_linux26 with 1.4.2
Christopher D. Clausen
Mon, 13 Nov 2006 14:25:02 -0600
Okay, so sleepless.acm.uiuc.edu hosts all websites on www.acm.uiuc.edu.
Its Debian sarge on x86, with apache2, mod_php5 (from backports.org),
and Trac running under mod_fastcgi or mod_fcgid depending on if its SSL
or not. Its a dual Xeon 2.0 GHz (hyperthreaded and hyperthreading is
turned on, which might actually be the problem, I don't have another HT
box to test.) Machine has 1GB of RAM and two SCSI HDDs, one of them
dedicated to the AFS cache.
Every 3 weeks or so, the machine ends up using so much non-pagable
memory that OOM killer starts whacking processes and in general, bad
things happen. Very little if any swap is in use (on the order of a few
MBs.) This can be solved by stopping everything that is accessing AFS
and restarting the AFS client. Its fine for another 3 weeks and the
We were running 1.4.1 and I just upgraded to 1.4.2 (about three weeks
ago) and it still has this problem.
I'm currently running with the Debian 1.4.2-2 package (backported to
sarge) default afsd options for a 14GB cache and have tried using a
smaller 5GB cache and reducing the afsd parameters with no effect. The
standard debian 2.6.8-3-686-smp kernel is in use. Cache partition is
ext3. I believe that is safe, right?
This same machine was working fine for over a year as a workstation with
a much smaller AFS cache (although an admitedly much smaller load as
well,) so something about the current setup has broken things.
I'm mostly a Windows guy, so I'm not really sure how to debug this
further or otherwise figure out what is using RAM, (although I'm pretty
sure its somehow afsd). Vmstat -m reports some rather large allocations
of certain block sizes, but thats about all I know. Well, that and the
fact that restarting the AFS client fixes the problem for another 3
Anyone have any tips on tracking this down? Or think it might be the
Christoher D. Clausen