Bug#594812: noflushd: Noflushd causes flush- processes to eat all CPU

Xavier Roche Sun, 29 Aug 2010 11:27:29 -0700

Package: noflushd
Version: 2.8-1
Severity: important

I think the problem might be still there, when some monitored disks are 
becoming automatically idle (or through "hdparm -S242").


Note that the given disks do not need to have pending write, apparently, for 
the problem to be reproducible.

I managed to reproduce the issue after a clean reboot (and after
removing some potentially new options from the grsecurity kernel - to be
sure that this was not a possible cause) on a fresh 2.6.34.4 kernel.

I started noflushd, and then waited for some time, and the problem appeared 
again. Monitored disks are all configured to go in idle after a while (using 
"hdparm -S242 /dev/.." at startup)

In this state, the noflushd daemon is still running (and not consumming
cpu), but flush-* process do:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
13604 root      20   0     0    0    0 R 48.1  0.0   8:19.70 flush-34:0
13605 root      20   0     0    0    0 R 48.1  0.0   5:42.76 flush-8:0

After a while, more flush- processes appears, and the load increases.

The noflushd demon appears to be still running (it is NOT stuck, even if
flush-* kernel jobs are stuck), and each 5 seconds attempt to do fsync's()

nanosleep({5, 0}, {5, 0})               = 0
time(NULL)                              = 1283100653
_llseek(5, 0, [0], SEEK_SET)            = 0
read(5, "   3      64 hdb 98217 251654 278"..., 1024) = 1024
read(5, "0 0 0 0 0 0 0 0 0 0\n"..., 1024) = 20
read(5, ""..., 1024)                    = 0
time(NULL)                              = 1283100653
_llseek(3, 0, [0], SEEK_SET)            = 0
read(3, "major minor  #blocks  name\n\n   3 "..., 1024) = 354
fsync(6)                                = 0
fsync(7)                                = 0
fsync(10)                               = 0
fsync(11)                               = 0
fsync(12)                               = 0
fsync(13)                               = 0
fsync(14)                               = 0
fsync(15)                               = 0
read(3, ""..., 1024)                    = 0
fsync(16)                               = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({5, 0}, ..
(repeated endlessly - ie. it does not wait 60 seconds as it used to do
before)

(/proc/<pid-of-flush-processed>/wchan gives 0)

No i/o activity on disk, but load increasing as flush- process appears.

After touching the mounted directory corresponding to the idle disk to force a 
disk spinup (a "ls" will take several seconds until the disk is back to 
normal), the load goes back to zero, and the system sync stucked processes 
returns.

The noflushd process then goes back to a 60 second loop:

time(NULL)                              = 1283100976
_llseek(5, 0, [0], SEEK_SET)            = 0
read(5, "   3      64 hdb 98222 251654 278"..., 1024) = 1024
read(5, "0 0 0 0 0 0 0 0 0 0\n"..., 1024) = 20
read(5, ""..., 1024)                    = 0
time(NULL)                              = 1283100976
_llseek(3, 0, [0], SEEK_SET)            = 0
read(3, "major minor  #blocks  name\n\n   3 "..., 1024) = 354
fsync(6)                                = 0
fsync(7)                                = 0
fsync(10)                               = 0
fsync(11)                               = 0
fsync(12)                               = 0
fsync(13)                               = 0
fsync(14)                               = 0
fsync(15)                               = 0
read(3, ""..., 1024)                    = 0
fsync(16)                               = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({5, 0},

{5, 0})               = 0
time(NULL)                              = 1283100981
_llseek(5, 0, [0], SEEK_SET)            = 0
read(5, "   3      64 hdb 98222 251654 278"..., 1024) = 1024
read(5, "0 0 0 0 0 0 0 0 0 0\n"..., 1024) = 20
read(5, ""..., 1024)                    = 0
time(NULL)                              = 1283100981
time(NULL)                              = 1283100981
_llseek(3, 0, [0], SEEK_SET)            = 0
read(3, "major minor  #blocks  name\n\n   3 "..., 1024) = 354
fsync(8)                                = 0
fsync(9)                                = 0
_llseek(4, 0, [0], SEEK_SET)            = 0
write(4, "500\n"..., 4)                 = 4
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({60, 0},

At this time, I think that the suspend mode might be the root of all
evil ; I don't known how it can impact noflushd anyway. Setting up disks
to automatically enter in standby mode (hdparm -S242 /dev/hd${dev}) appears to 
be the cause.

Using noflushd 2.8-1 ; Linux kernel 2.6.34.4.

I'm available to do more tests if necessary.


-- System Information:
Debian Release: 5.0.5
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.34.4-grsec (SMP w/1 CPU core)
Locale: lang=fr_fr.ut...@euro, lc_ctype=fr_fr.ut...@euro (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages noflushd depends on:
ii  debconf [debconf-2.0]         1.5.24     Debian configuration management sy
ii  ed                            0.7-3      The classic unix line editor
ii  libc6                         2.11.2-2   Embedded GNU C Library: Shared lib

noflushd recommends no packages.

noflushd suggests no packages.

-- debconf information:
  noflushd/expert: false
* noflushd/disks: /dev/hdb /dev/hde /dev/hdg
  noflushd/params:
* noflushd/timeout: 60



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#594812: noflushd: Noflushd causes flush- processes to eat all CPU

Reply via email to