[OpenAFS-devel] Linux dirty vm-buffers not flushed periodically

Alexander Bergolth leo@strike.wu-wien.ac.at
Tue, 30 May 2006 19:16:49 +0200


Hi!

On most of our boxes, running Fedora kernels (different versions, see
below) and OpenAFS 1.3.82 to 1.4.1, we are suffering from the problem
that after some uptime, dirty filesystem buffers are not periodically
flushed anymore. I'm experiencing this problem only on machines running
either OpenAFS or a Software-Suspend2 enabled kernel. (Maybe there is a
relation?) Boxes without those extensions behave normally.

On machines that show the problem, buffers are flushed only based
on the amount of dirty memory (dirty_ratio and dirty_background_ratio)
or when doing a "sync", they are not periodically flushed anymore.
There is one box where even "sync" does only flush a fraction of the
dirty buffers [2].

I am not using vmware, no laptop_mode and no custom /proc/sys/vm/dirty*
settings are involved.

It looks like the problems arise after a few days uptime, a freshly
booted system doesn't show the symptoms. (At least it doesn't show them
when I'm looking out for them. ;))

I've written a small test-script to visualize the behavior. [3]

The script creates a 200MB file, monitors nr_dirty in /proc/vmstat and
executes sync after some time.

The output looks like that:

-------------------- snip! bad: --------------------
Linux slime.wu-wien.ac.at 2.6.14-1.1653_FC4smp #1 SMP Tue Dec 13
21:46:01 EST 2005 i686 i686 i386 GNU/Linux
 12:22:46 up 10 days, 18:12, 37 users,  load average: 0.17, 0.15, 0.09
12:22:46 start: head -c 200000000 /dev/zero
>/var/tmp/dirty-buffers.EFYFF13399 # nr_dirty 1076
12:22:46 # nr_dirty 1805
12:22:47 end: head -c 200000000 /dev/zero
>/var/tmp/dirty-buffers.EFYFF13399 # nr_dirty 31061
12:22:51 # nr_dirty 25671
12:22:56 # nr_dirty 25724
12:23:01 # nr_dirty 25724
12:23:06 # nr_dirty 25724
12:23:11 # nr_dirty 25724
12:23:16 # nr_dirty 25724
12:23:21 # nr_dirty 25724
12:23:26 # nr_dirty 25724
12:23:31 # nr_dirty 25724
12:23:36 # nr_dirty 25724
12:23:41 # nr_dirty 25724
12:23:47 # nr_dirty 25725
12:23:52 # nr_dirty 25726
12:23:57 # nr_dirty 25728
12:24:02 # nr_dirty 25728
12:24:07 # nr_dirty 25728
12:24:12 # nr_dirty 25728
12:24:12 # nr_dirty 25728
12:24:12 start: sync # nr_dirty 25728
12:24:12 end: sync # nr_dirty 23566
12:24:17 # nr_dirty 23566
12:24:22 # nr_dirty 23582
12:24:27 # nr_dirty 23583
12:24:32 # nr_dirty 23583
12:24:37 # nr_dirty 23583
12:24:42 # nr_dirty 23583
12:24:47 # nr_dirty 23583
12:24:52 # nr_dirty 23583
12:24:57 # nr_dirty 23583
-------------------- snip! --------------------

While writing the temp-file, some buffers are flushed. (31061->25671)
But after writing is completed, the 25000 buffers remain dirty and are
not flushed after 30 secs, as I would expect. The sync causes the dirty
buffers to shrink from 25728 to 23566 but I'd expect that sync should
cause them to become near 0.

Here is the output of another system with a lower uptime that doesn't
show that behavior yet:

-------------------- snip! good: --------------------
Linux roaster.wu-wien.ac.at
2.6.16-1.2111_1.99.rhfc5.cubbi_suspend2_8ksmp #1 SMP Mon May 15 09:47:41
CEST 2006 i686 i686 i386 GNU/Linux
 18:56:50 up 23:31,  5 users,  load average: 0.01, 0.03, 0.06
18:56:50 # nr_dirty 4
18:56:50 start: head -c 200000000 /dev/zero
>/tmp/dirty-buffers.wxtSuY5341 # nr_dirty 4
18:56:50 end: head -c 200000000 /dev/zero >/tmp/dirty-buffers.wxtSuY5341
# nr_dirty 25256
18:56:55 # nr_dirty 25256
18:57:00 # nr_dirty 25257
18:57:05 # nr_dirty 25257
18:57:10 # nr_dirty 25257
18:57:15 # nr_dirty 25257
18:57:20 # nr_dirty 25257
18:57:25 # nr_dirty 24232
18:57:30 # nr_dirty 1
18:57:35 # nr_dirty 1
18:57:40 # nr_dirty 1
18:57:45 # nr_dirty 1
18:57:50 # nr_dirty 1
18:57:55 # nr_dirty 1
18:58:00 # nr_dirty 1
18:58:05 # nr_dirty 0
18:58:10 # nr_dirty 0
18:58:15 # nr_dirty 1
18:58:15 # nr_dirty 1
18:58:15 start: sync # nr_dirty 1
18:58:15 end: sync # nr_dirty 0
18:58:20 # nr_dirty 0
18:58:25 # nr_dirty 0
18:58:30 # nr_dirty 0
18:58:35 # nr_dirty 0
18:58:40 # nr_dirty 0
18:58:45 # nr_dirty 0
18:58:50 # nr_dirty 0
18:58:55 # nr_dirty 0
18:59:00 # nr_dirty 0
-------------------- snip! --------------------

As I'm seeing this error on many Fedora-based systems (FC3, FC4 or FC5),
using different OpenAFS versions, I guess there must be others that have
the same problem (but maybe didn't notice it yet).

Any help would be greatly appreciated.
Cheers,
--leo

P.S.: I already reported it to LKML but didn't get a response. :(

[1] Tests results:
http://leo.kloburg.at/tmp/dirty-buffers/

[2] sync does flush only a fraction of dirty buffers:
http://leo.kloburg.at/tmp/dirty-buffers/bad_slime-2.6.14-1.1653_FC4smp.txt

[3] Test-script to visualize the error:
http://leo.kloburg.at/tmp/dirty-buffers/dirty-buffers.sh

-- 
-----------------------------------------------------------------------
Alexander.Bergolth@wu-wien.ac.at                Fax: +43-1-31336-906050
Zentrum fuer Informatikdienste - Wirtschaftsuniversitaet Wien - Austria