Hi there,

Is there any solution yet for this issue?
I do not see any lockups on the logs, but the system still crashes.
I use OpenVZ kernels on 2 machines with same kernel (2.6.24-27-openvz), same 
hardware, both with hardy and both of them are affected.
Every 1st Sunday the systems crash on array check.

I've tried to downgrade the kernel of one of the two machines 
(2.6.24-26-openvz).
The next first Sunday of the month, I get the lower kernel version crashed, the 
other machine instead had a raid array sync failure, that somehow kept it alive.
I received this email about the array sync failure on the working machine:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] 
[raid10]
md3 : active raid0 md1[0] md2[1]
     898643712 blocks 64k chunks

md2 : active raid1 sdc2[0] sdd2[1]
     449321920 blocks [2/2] [UU]

md1 : active raid1 sda2[2](F) sdb2[1]
     449321920 blocks [2/1] [_U]
     [===================>.]  check = 99.9% (448939328/449321920) finish=0.5min 
speed=10895K/sec

md0 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
     39061952 blocks [4/4] [UUUU]

unused devices: <none>


And on /var/log/debug (on the machine that is still alive) we have:

May  2 01:06:01 verus /USR/SBIN/CRON[8509]: (root) CMD ([ -x 
/usr/share/mdadm/checkarray ] && [ $(date +%d) -le 7 ] && 
/usr/share/mdadm/checkarray --cron --all --quiet)
May  2 01:06:01 verus kernel: [2200235.996209] md: data-check of RAID array md0
May  2 01:06:01 verus kernel: [2200235.996218] md: minimum _guaranteed_  speed: 
1000 KB/sec/disk.
May  2 01:06:01 verus kernel: [2200235.996224] md: using maximum available idle 
IO bandwidth (but not more than 200000 KB/sec) for data-check.
May  2 01:06:01 verus kernel: [2200235.996235] md: using 128k window, over a 
total of 39061952 blocks.
May  2 01:06:01 verus kernel: [2200235.998255] md: delaying data-check of md1 
until md0 has finished (they share one or more physical units)
May  2 01:06:01 verus kernel: [2200235.998875] md: delaying data-check of md2 
until md0 has finished (they share one or more physical units)
May  2 01:06:01 verus mdadm: RebuildStarted event detected on md device /dev/md0
May  2 01:08:01 verus mdadm: Rebuild20 event detected on md device /dev/md0


The same machine, the month before, didn't had any array sync failure. It 
crashed and the message was the same:

Apr  4 01:06:01 verus /USR/SBIN/CRON[7539]: (root) CMD ([ -x 
/usr/share/mdadm/checkarray ] && [ $(date +%d) -le 7 ] && 
/usr/share/mdadm/checkarray --cron --all --quiet)
Apr  4 01:06:01 verus kernel: [1248149.935860] md: data-check of RAID array md0
Apr  4 01:06:01 verus kernel: [1248149.935868] md: minimum _guaranteed_  speed: 
1000 KB/sec/disk.
Apr  4 01:06:01 verus kernel: [1248149.935872] md: using maximum available idle 
IO bandwidth (but not more than 200000 KB/sec) for data-check.
Apr  4 01:06:01 verus kernel: [1248149.935883] md: using 128k window, over a 
total of 39061952 blocks.
Apr  4 01:06:01 verus kernel: [1248149.939456] md: delaying data-check of md1 
until md0 has finished (they share one or more physical units)
Apr  4 01:06:01 verus mdadm: RebuildStarted event detected on md device /dev/md0
Apr  4 01:06:01 verus kernel: [1248149.939653] md: delaying data-check of md2 
until md0 has finished (they share one or more physical units)


Apr  4 01:15:24 verus kernel: [1248711.999078] md: md0: data-check done.
Apr  4 01:15:24 verus kernel: [1248712.069924] md: data-check of RAID array md2
Apr  4 01:15:24 verus kernel: [1248712.069933] md: minimum _guaranteed_  speed: 
1000 KB/sec/disk.
Apr  4 01:15:24 verus kernel: [1248712.069937] md: using maximum available idle 
IO bandwidth (but not more than 200000 KB/sec) for data-check.
Apr  4 01:15:24 verus kernel: [1248712.069946] md: using 128k window, over a 
total of 449321920 blocks.
Apr  4 01:15:24 verus kernel: [1248712.072029] md: data-check of RAID array md1
Apr  4 01:15:24 verus kernel: [1248712.072032] md: minimum _guaranteed_  speed: 
1000 KB/sec/disk.
Apr  4 01:15:24 verus kernel: [1248712.072036] md: using maximum available idle 
IO bandwidth (but not more than 200000 KB/sec) for data-check.
Apr  4 01:15:24 verus kernel: [1248712.072044] md: using 128k window, over a 
total of 449321920 blocks.
Apr  4 01:15:24 verus kernel: [1248712.133862] RAID1 conf printout:
Apr  4 01:15:24 verus kernel: [1248712.133876]  --- wd:4 rd:4
Apr  4 01:15:24 verus kernel: [1248712.133881]  disk 0, wo:0, o:1, dev:sda1
Apr  4 01:15:24 verus kernel: [1248712.133884]  disk 1, wo:0, o:1, dev:sdb1
Apr  4 01:15:24 verus kernel: [1248712.133888]  disk 2, wo:0, o:1, dev:sdc1
Apr  4 01:15:24 verus mdadm: RebuildStarted event detected on md device /dev/md2
Apr  4 01:15:24 verus kernel: [1248712.133891]  disk 3, wo:0, o:1, dev:sdd1
Apr  4 01:15:24 verus mdadm: RebuildFinished event detected on md device 
/dev/md0
Apr  4 01:15:24 verus mdadm: RebuildStarted event detected on md device /dev/md1


Apr  4 01:42:25 verus mdadm: Rebuild20 event detected on md device /dev/md2
Apr  4 01:43:25 verus mdadm: Rebuild20 event detected on md device /dev/md1
Apr  4 02:10:25 verus mdadm: Rebuild40 event detected on md device /dev/md2
Apr  4 02:12:25 verus mdadm: Rebuild40 event detected on md device /dev/md1
Apr  4 02:37:25 verus mdadm: Rebuild60 event detected on md device /dev/md2
Apr  4 02:41:25 verus mdadm: Rebuild60 event detected on md device /dev/md1

(now time change for end of DST)

Apr  4 02:19:25 verus mdadm: Rebuild80 event detected on md device /dev/md2
Apr  4 02:20:02 verus /USR/SBIN/CRON[8328]: (root) CMD 
(/usr/share/vzctl/scripts/vpsreboot)
Apr  4 02:20:06 verus /USR/SBIN/CRON[8331]: (root) CMD 
(/usr/share/vzctl/scripts/vpsnetclean)

And the system crashes.

Even the downgrade of the kernel didn't stop this.

If you need more info, let me know.

Thanks

-- 
RAID1 data-checks cause CPU soft lockups
https://bugs.launchpad.net/bugs/212684
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to