I've been putting a little effort into debugging this. It turns out that every now and then, the dd process feeding klogd dies. Klogd then loops at 100%CPU.
The question then is, why does dd die? I wrote a replacement dd that printed out what it was doing. dd dies because it gets a zero-byte read (i.e., the kernel is signalling EOF). With kernel 3.0.0 this seems to happen a lot --- the dd only lasts for a few seconds after being restarted on a busy system. I suspect a race between emit_log_char() and do_syslog(SYSLOG_ACTION_READ...) in the kernel, when the kernel is logging a LOT of data. My attempt at fixing it was this in the kernel. But I can't quite see the race, to fix it properly. diff --git a/kernel/printk.c b/kernel/printk.c index 37dff34..0e44138 100644 --- a/kernel/printk.c +++ b/kernel/printk.c @@ -358,6 +358,7 @@ int do_syslog(int type, char __user *buf, int len, bool from_file) error = -EFAULT; goto out; p } + again: error = wait_event_interruptible(log_wait, (log_start - log_end)); if (error) @@ -377,6 +378,8 @@ int do_syslog(int type, char __user *buf, int len, bool from_file) spin_unlock_irq(&logbuf_lock); if (!error) error = i; + if (i == 0) + goto again; break; /* Read/clear last kernel messages */ case SYSLOG_ACTION_READ_CLEAR: -- Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au http://www.ertos.nicta.com.au ERTOS within National ICT Australia -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org