Hi there, Is there any solution yet for this issue? I do not see any lockups on the logs, but the system still crashes. I use OpenVZ kernels on 2 machines with same kernel (2.6.24-27-openvz), same hardware, both with hardy and both of them are affected. Every 1st Sunday the systems crash on array check.
I've tried to downgrade the kernel of one of the two machines (2.6.24-26-openvz). The next first Sunday of the month, I get the lower kernel version crashed, the other machine instead had a raid array sync failure, that somehow kept it alive. I received this email about the array sync failure on the working machine: Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md3 : active raid0 md1[0] md2[1] 898643712 blocks 64k chunks md2 : active raid1 sdc2[0] sdd2[1] 449321920 blocks [2/2] [UU] md1 : active raid1 sda2[2](F) sdb2[1] 449321920 blocks [2/1] [_U] [===================>.] check = 99.9% (448939328/449321920) finish=0.5min speed=10895K/sec md0 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1] 39061952 blocks [4/4] [UUUU] unused devices: <none> And on /var/log/debug (on the machine that is still alive) we have: May 2 01:06:01 verus /USR/SBIN/CRON[8509]: (root) CMD ([ -x /usr/share/mdadm/checkarray ] && [ $(date +%d) -le 7 ] && /usr/share/mdadm/checkarray --cron --all --quiet) May 2 01:06:01 verus kernel: [2200235.996209] md: data-check of RAID array md0 May 2 01:06:01 verus kernel: [2200235.996218] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. May 2 01:06:01 verus kernel: [2200235.996224] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. May 2 01:06:01 verus kernel: [2200235.996235] md: using 128k window, over a total of 39061952 blocks. May 2 01:06:01 verus kernel: [2200235.998255] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) May 2 01:06:01 verus kernel: [2200235.998875] md: delaying data-check of md2 until md0 has finished (they share one or more physical units) May 2 01:06:01 verus mdadm: RebuildStarted event detected on md device /dev/md0 May 2 01:08:01 verus mdadm: Rebuild20 event detected on md device /dev/md0 The same machine, the month before, didn't had any array sync failure. It crashed and the message was the same: Apr 4 01:06:01 verus /USR/SBIN/CRON[7539]: (root) CMD ([ -x /usr/share/mdadm/checkarray ] && [ $(date +%d) -le 7 ] && /usr/share/mdadm/checkarray --cron --all --quiet) Apr 4 01:06:01 verus kernel: [1248149.935860] md: data-check of RAID array md0 Apr 4 01:06:01 verus kernel: [1248149.935868] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Apr 4 01:06:01 verus kernel: [1248149.935872] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. Apr 4 01:06:01 verus kernel: [1248149.935883] md: using 128k window, over a total of 39061952 blocks. Apr 4 01:06:01 verus kernel: [1248149.939456] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) Apr 4 01:06:01 verus mdadm: RebuildStarted event detected on md device /dev/md0 Apr 4 01:06:01 verus kernel: [1248149.939653] md: delaying data-check of md2 until md0 has finished (they share one or more physical units) Apr 4 01:15:24 verus kernel: [1248711.999078] md: md0: data-check done. Apr 4 01:15:24 verus kernel: [1248712.069924] md: data-check of RAID array md2 Apr 4 01:15:24 verus kernel: [1248712.069933] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Apr 4 01:15:24 verus kernel: [1248712.069937] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. Apr 4 01:15:24 verus kernel: [1248712.069946] md: using 128k window, over a total of 449321920 blocks. Apr 4 01:15:24 verus kernel: [1248712.072029] md: data-check of RAID array md1 Apr 4 01:15:24 verus kernel: [1248712.072032] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Apr 4 01:15:24 verus kernel: [1248712.072036] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. Apr 4 01:15:24 verus kernel: [1248712.072044] md: using 128k window, over a total of 449321920 blocks. Apr 4 01:15:24 verus kernel: [1248712.133862] RAID1 conf printout: Apr 4 01:15:24 verus kernel: [1248712.133876] --- wd:4 rd:4 Apr 4 01:15:24 verus kernel: [1248712.133881] disk 0, wo:0, o:1, dev:sda1 Apr 4 01:15:24 verus kernel: [1248712.133884] disk 1, wo:0, o:1, dev:sdb1 Apr 4 01:15:24 verus kernel: [1248712.133888] disk 2, wo:0, o:1, dev:sdc1 Apr 4 01:15:24 verus mdadm: RebuildStarted event detected on md device /dev/md2 Apr 4 01:15:24 verus kernel: [1248712.133891] disk 3, wo:0, o:1, dev:sdd1 Apr 4 01:15:24 verus mdadm: RebuildFinished event detected on md device /dev/md0 Apr 4 01:15:24 verus mdadm: RebuildStarted event detected on md device /dev/md1 Apr 4 01:42:25 verus mdadm: Rebuild20 event detected on md device /dev/md2 Apr 4 01:43:25 verus mdadm: Rebuild20 event detected on md device /dev/md1 Apr 4 02:10:25 verus mdadm: Rebuild40 event detected on md device /dev/md2 Apr 4 02:12:25 verus mdadm: Rebuild40 event detected on md device /dev/md1 Apr 4 02:37:25 verus mdadm: Rebuild60 event detected on md device /dev/md2 Apr 4 02:41:25 verus mdadm: Rebuild60 event detected on md device /dev/md1 (now time change for end of DST) Apr 4 02:19:25 verus mdadm: Rebuild80 event detected on md device /dev/md2 Apr 4 02:20:02 verus /USR/SBIN/CRON[8328]: (root) CMD (/usr/share/vzctl/scripts/vpsreboot) Apr 4 02:20:06 verus /USR/SBIN/CRON[8331]: (root) CMD (/usr/share/vzctl/scripts/vpsnetclean) And the system crashes. Even the downgrade of the kernel didn't stop this. If you need more info, let me know. Thanks -- RAID1 data-checks cause CPU soft lockups https://bugs.launchpad.net/bugs/212684 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs