On 12.02.2019 16:08, steve wrote: > Hi There, > > Here is what I get on my up to date stretch box: > > Feb 10 12:19:12 box kernel: [ 7734.562060] INFO: task md0_raid1:461 > blocked for more than 120 seconds. > Feb 10 12:19:12 box kernel: [ 7734.562068] Tainted: P > OE 4.19.0-0.bpo.1-amd64 #1 Debian 4.19.12-1~bpo9+1 > Feb 10 12:19:12 box kernel: [ 7734.562070] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Feb 10 12:19:12 box kernel: [ 7734.562074] md0_raid1 D 0 > 461 2 0x80000000 > Feb 10 12:19:12 box kernel: [ 7734.562078] Call Trace: > Feb 10 12:19:12 box kernel: [ 7734.562093] ? __schedule+0x3f5/0x880 > Feb 10 12:19:12 box kernel: [ 7734.562098] schedule+0x32/0x80 > Feb 10 12:19:12 box kernel: [ 7734.562112] md_super_wait+0x6e/0xa0 > [md_mod] > Feb 10 12:19:12 box kernel: [ 7734.562120] ? remove_wait_queue+0x60/0x60 > Feb 10 12:19:12 box kernel: [ 7734.562128] > md_update_sb.part.61+0x4af/0x910 [md_mod] > Feb 10 12:19:12 box kernel: [ 7734.562136] > md_check_recovery+0x312/0x530 [md_mod] > Feb 10 12:19:12 box kernel: [ 7734.562143] raid1d+0x60/0x8c0 [raid1] > Feb 10 12:19:12 box kernel: [ 7734.562149] ? schedule+0x32/0x80 > Feb 10 12:19:12 box kernel: [ 7734.562152] ? > schedule_timeout+0x1e5/0x350 > Feb 10 12:19:12 box kernel: [ 7734.562159] ? md_thread+0x125/0x170 > [md_mod] > Feb 10 12:19:12 box kernel: [ 7734.562165] md_thread+0x125/0x170 > [md_mod] > Feb 10 12:19:12 box kernel: [ 7734.562170] ? remove_wait_queue+0x60/0x60 > Feb 10 12:19:12 box kernel: [ 7734.562174] kthread+0xf8/0x130 > Feb 10 12:19:12 box kernel: [ 7734.562180] ? md_rdev_init+0xc0/0xc0 > [md_mod] > Feb 10 12:19:12 box kernel: [ 7734.562183] ? > kthread_create_worker_on_cpu+0x70/0x70 > Feb 10 12:19:12 box kernel: [ 7734.562186] ret_from_fork+0x35/0x40 > Feb 10 12:19:12 box kernel: [ 7734.562192] INFO: task md1_raid1:468 > blocked for more than 120 seconds. > Feb 10 12:19:12 box kernel: [ 7734.562196] Tainted: P > OE 4.19.0-0.bpo.1-amd64 #1 Debian 4.19.12-1~bpo9+1 > Feb 10 12:19:12 box kernel: [ 7734.562198] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Feb 10 12:19:12 box kernel: [ 7734.562200] md1_raid1 D 0 > 468 2 0x80000000 > Feb 10 12:19:12 box kernel: [ 7734.562203] Call Trace: > Feb 10 12:19:12 box kernel: [ 7734.562209] ? __schedule+0x3f5/0x880 > Feb 10 12:19:12 box kernel: [ 7734.562213] schedule+0x32/0x80 > Feb 10 12:19:12 box kernel: [ 7734.562219] md_super_wait+0x6e/0xa0 > [md_mod] > Feb 10 12:19:12 box kernel: [ 7734.562224] ? remove_wait_queue+0x60/0x60 > Feb 10 12:19:12 box kernel: [ 7734.562230] > md_update_sb.part.61+0x4af/0x910 [md_mod] > Feb 10 12:19:12 box kernel: [ 7734.562237] > md_check_recovery+0x312/0x530 [md_mod] > Feb 10 12:19:12 box kernel: [ 7734.562241] raid1d+0x60/0x8c0 [raid1] > Feb 10 12:19:12 box kernel: [ 7734.562246] ? schedule+0x32/0x80 > Feb 10 12:19:12 box kernel: [ 7734.562248] ? > schedule_timeout+0x1e5/0x350 > Feb 10 12:19:12 box kernel: [ 7734.562255] ? md_thread+0x125/0x170 > [md_mod] > Feb 10 12:19:12 box kernel: [ 7734.562260] md_thread+0x125/0x170 > [md_mod] > Feb 10 12:19:12 box kernel: [ 7734.562265] ? remove_wait_queue+0x60/0x60 > Feb 10 12:19:12 box kernel: [ 7734.562268] kthread+0xf8/0x130 > Feb 10 12:19:12 box kernel: [ 7734.562274] ? md_rdev_init+0xc0/0xc0 > [md_mod] > Feb 10 12:19:12 box kernel: [ 7734.562276] ? > kthread_create_worker_on_cpu+0x70/0x70 > Feb 10 12:19:12 box kernel: [ 7734.562279] ret_from_fork+0x35/0x40 > Feb 10 12:19:12 box kernel: [ 7734.562366] INFO: task fetchmail:3277 > blocked for more than 120 seconds. > Feb 10 12:19:12 box kernel: [ 7734.562369] Tainted: P > OE 4.19.0-0.bpo.1-amd64 #1 Debian 4.19.12-1~bpo9+1 > Feb 10 12:19:12 box kernel: [ 7734.562371] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Feb 10 12:19:12 box kernel: [ 7734.562373] fetchmail D 0 > 3277 1 0x00000080 > Feb 10 12:19:12 box kernel: [ 7734.562376] Call Trace: > Feb 10 12:19:12 box kernel: [ 7734.562381] ? __schedule+0x3f5/0x880 > Feb 10 12:19:12 box kernel: [ 7734.562412] ? > __ext4_handle_dirty_metadata+0x51/0x1a0 [ext4] > Feb 10 12:19:12 box kernel: [ 7734.562418] schedule+0x32/0x80 > Feb 10 12:19:12 box kernel: [ 7734.562426] md_write_start+0x13c/0x210 > [md_mod] > Feb 10 12:19:12 box kernel: [ 7734.562432] ? remove_wait_queue+0x60/0x60 > Feb 10 12:19:12 box kernel: [ 7734.562437] > raid1_make_request+0x63/0xbc0 [raid1] > Feb 10 12:19:12 box kernel: [ 7734.562442] ? mempool_alloc+0x69/0x190 > Feb 10 12:19:12 box kernel: [ 7734.562446] ? bvec_alloc+0x86/0xe0 > Feb 10 12:19:12 box kernel: [ 7734.562453] > md_handle_request+0x116/0x190 [md_mod] > Feb 10 12:19:12 box kernel: [ 7734.562460] md_make_request+0x72/0x170 > [md_mod] > Feb 10 12:19:12 box kernel: [ 7734.562465] > generic_make_request+0x1e7/0x410 > Feb 10 12:19:12 box kernel: [ 7734.562471] ? submit_bio+0x6c/0x140 > Feb 10 12:19:12 box kernel: [ 7734.562474] submit_bio+0x6c/0x140 > Feb 10 12:19:12 box kernel: [ 7734.562500] ext4_io_submit+0x49/0x60 > [ext4] > Feb 10 12:19:12 box kernel: [ 7734.562519] > ext4_writepages+0x621/0xe00 [ext4] > Feb 10 12:19:12 box kernel: [ 7734.562526] ? do_writepages+0x1a/0x60 > Feb 10 12:19:12 box kernel: [ 7734.562529] do_writepages+0x1a/0x60 > Feb 10 12:19:12 box kernel: [ 7734.562533] > __filemap_fdatawrite_range+0xc8/0x100 > Feb 10 12:19:12 box kernel: [ 7734.562552] ext4_rename+0x676/0x900 > [ext4] > > > and so one. > > The system blocks for about 3 minutes and then I get back a hand on it. > > I'm no specialist so I don't know what to do. > Any help would appreciated. > > Best, > Steve > Seems like this behavior is explained in this article, including the solution. [1] Not a bug, but something that requires an optimization settings. *But*, I'd first checked the disks for hardware problems and md array for consistency, after complete data backup is made and verified, of course, like David already suggested.
Depending on the drives they could be "slow" because of active APM putting them to sleep state and causing them to spin down and park heads, or they could be "slow" because of bad blocks are found and drive's firmware trying to reallocate them. Bottom line is, you have to ensure your hard drives are ok. To investigate further, more information is required. Get output of this command for each device: $ sudo smartctl --all /dev/sda Also check current state of your md array: $ sudo mdadm --detail --scan --verbose $ cat /proc/mdstat [1] https://globalroot.wordpress.com/2015/02/13/linux-error-message-task-blocked-for-more-than-120-seconds/ -- With kindest regards, Alexander. ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system ⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org ⠈⠳⣄⠀⠀⠀⠀