I've been putting a little effort into debugging this.  It turns out
that every now and then, the dd process feeding klogd dies.  Klogd
then loops at 100%CPU.

The question then is, why does dd die?  I wrote a replacement dd that
printed out what it was doing.  dd dies because it gets a zero-byte
read (i.e., the kernel is signalling EOF).

With kernel 3.0.0 this seems to happen a lot --- the dd only lasts for
a few seconds after being restarted on a busy system.

I suspect a race between emit_log_char() and
do_syslog(SYSLOG_ACTION_READ...) in the kernel, when the kernel is
logging a LOT of data.

My attempt at fixing it was this in the kernel.  But I can't quite see
the race, to fix it properly.

diff --git a/kernel/printk.c b/kernel/printk.c
index 37dff34..0e44138 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -358,6 +358,7 @@ int do_syslog(int type, char __user *buf, int len, bool 
from_file)
                        error = -EFAULT;
                        goto out;
p               }
+       again:
                error = wait_event_interruptible(log_wait,
                                                        (log_start - log_end));
                if (error)
@@ -377,6 +378,8 @@ int do_syslog(int type, char __user *buf, int len, bool 
from_file)
                spin_unlock_irq(&logbuf_lock);
                if (!error)
                        error = i;
+               if (i == 0)
+                       goto again;
                break;
        /* Read/clear last kernel messages */
        case SYSLOG_ACTION_READ_CLEAR:
--
Dr Peter Chubb  http://www.gelato.unsw.edu.au  peterc AT gelato.unsw.edu.au
http://www.ertos.nicta.com.au           ERTOS within National ICT Australia



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to