On Tuesday June 20, [EMAIL PROTECTED] wrote:
> Nigel J. Terry wrote:
>
> Well good news and bad news I'm afraid...
>
> Well I would like to be able to tell you that the time calculation now
> works, but I can't. Here's why: Why I rebooted with the newly built
> kernel, it decided to hit the magic 21 reboots and hence decided to
> check the array for clean. The normally takes about 5-10 mins, but this
> time took several hours, so I went to bed! I suspect that it was doing
> the full reshape or something similar at boot time.
>
What "magic 21 reboots"?? md has no mechanism to automatically check
the array after N reboots or anything like that. Or are you thinking
of the 'fsck' that does a full check every so-often?
> Now I am not sure that this makes good sense in a normal environment.
> This could keep a server down for hours or days. I might suggest that if
> such work was required, the clean check is postponed till next boot and
> the reshape allowed to continue in the background.
An fsck cannot tell if there is a reshape happening, but the reshape
should notice the fsck and slow down to a crawl so the fsck can complete...
>
> Anyway the good news is that this morning, all is well, the array is
> clean and grown as can be seen below. However, if you look further below
> you will see the section from dmesg which still shows RIP errors, so I
> guess there is still something wrong, even though it looks like it is
> working. Let me know if i can provide any more information.
>
> Once again, many thanks. All I need to do now is grow the ext3 filesystem...
.....
> ...ok start reshape thread
> md: syncing RAID array md0
> md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
> md: using maximum available idle IO bandwidth (but not more than 200000
> KB/sec) for reconstruction.
> md: using 128k window, over a total of 245111552 blocks.
> Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
> <0000000000000000>{stext+2145382632}
> PGD 7c3f9067 PUD 7cb9e067 PMD 0
....
> Process md0_reshape (pid: 1432, threadinfo ffff81007aa42000, task
> ffff810037f497b0)
> Stack: ffffffff803dce42 0000000000000000 000000001d383600 0000000000000000
> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000
> Call Trace: <ffffffff803dce42>{md_do_sync+1307}
> <ffffffff802640c0>{thread_return+0}
> <ffffffff8026411e>{thread_return+94}
> <ffffffff8029925d>{keventd_create_kthread+0}
> <ffffffff803dd3d9>{md_thread+248}
That looks very much like the bug that I already sent you a patch for!
Are you sure that the new kernel still had this patch?
I'm a bit confused by this....
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html