After a little less than two weeks running on 4.8, the second board finally crashed during an “aptitude update” with:
1062229.158740] mmc0: Card stuck in programming state! mmc_do_erase [1062229.933728] sunxi-mmc 1c0f000.mmc: fatal err update clk timeout [1062230.708711] sunxi-mmc 1c0f000.mmc: fatal err update clk timeout [1062231.463699] sunxi-mmc 1c0f000.mmc: fatal err update clk timeout [1062231.470938] mmc0: tried to reset card, got error -5 [1062231.476059] blk_update_request: I/O error, dev mmcblk0, sector 21346248 [1062231.483160] mmcblk0: error -5 sending status command, retrying [1062231.489265] mmcblk0: error -5 sending status command, retrying [1062231.495340] mmcblk0: error -5 sending status command, aborting [1062232.268669] sunxi-mmc 1c0f000.mmc: fatal err update clk timeout [1062233.043649] sunxi-mmc 1c0f000.mmc: fatal err update clk timeout [1062233.798631] sunxi-mmc 1c0f000.mmc: fatal err update clk timeout [1062233.805874] mmc0: tried to reset card, got error -5 [1062233.810996] blk_update_request: I/O error, dev mmcblk0, sector 2205936 [1062233.817793] blk_update_request: I/O error, dev mmcblk0, sector 2205944 [1062233.824550] blk_update_request: I/O error, dev mmcblk0, sector 2205952 [1062233.831305] blk_update_request: I/O error, dev mmcblk0, sector 2205960 [1062233.838047] blk_update_request: I/O error, dev mmcblk0, sector 2205968 [1062233.844802] blk_update_request: I/O error, dev mmcblk0, sector 2205976 [1062233.851541] blk_update_request: I/O error, dev mmcblk0, sector 2205984 [1062233.858288] blk_update_request: I/O error, dev mmcblk0, sector 2205992 [1062233.865045] blk_update_request: I/O error, dev mmcblk0, sector 2206000 Note that I'm not sure I have done a cold reboot when switching from 4.9 to 4.8. It has now been running (on 4.8) for nearly a month without any issue. On the other hand, my main board running on linux 4.4 crashed twice. Not being physically present when it crashed today, I could not figure out if it crashed for the same reason, but it needed a cold reboot anyway. As for the previous crash, I was actually logged on the machine and could get some debug output. While the error handling changed from 4.4 to 4.9, it is much probably the same issue: any task waiting on the MMC ended up being stuck: [147837.706454] INFO: task logrotate:31041 blocked for more than 120 seconds. [147837.713536] Not tainted 4.4.0-1-armmp #1 [147837.718211] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [147837.726313] logrotate D c0663814 0 31041 31040 0x00000000 [147837.733020] [<c0663814>] (__schedule) from [<c0663d44>] (schedule+0x50/0xa8) [147837.740386] [<c0663d44>] (schedule) from [<c0666e84>] (schedule_timeout+0x208/0x290) [147837.748418] [<c0666e84>] (schedule_timeout) from [<c0663550>] (io_schedule_timeout+0xc0/0x138) [147837.757274] [<c0663550>] (io_schedule_timeout) from [<c0664688>] (bit_wait_io+0x20/0x6c) [147837.765611] [<c0664688>] (bit_wait_io) from [<c0664234>] (__wait_on_bit+0x94/0xd0) [147837.773430] [<c0664234>] (__wait_on_bit) from [<c015812c>] (wait_on_page_bit+0xdc/0xf8) [147837.781770] [<c015812c>] (wait_on_page_bit) from [<c0158298>] (__filemap_fdatawait_range+0xec/0x144) [147837.791191] [<c0158298>] (__filemap_fdatawait_range) from [<c0158318>] (filemap_fdatawait_range+0x28/0x44) [147837.801206] [<c0158318>] (filemap_fdatawait_range) from [<c0159fe4>] (filemap_write_and_wait_range+0x68/0x90) [147837.811835] [<c0159fe4>] (filemap_write_and_wait_range) from [<bf10f314>] (ext4_sync_file+0x12c/0x378 [ext4]) [147837.822372] [<bf10f314>] (ext4_sync_file [ext4]) from [<c01eb0e4>] (vfs_fsync_range+0x68/0xc8) [147837.831421] [<c01eb0e4>] (vfs_fsync_range) from [<c01eb1cc>] (do_fsync+0x4c/0x74) [147837.839142] [<c01eb1cc>] (do_fsync) from [<c01eb4c0>] (SyS_fsync+0x1c/0x20) [147837.846318] [<c01eb4c0>] (SyS_fsync) from [<c0011d60>] (ret_fast_syscall+0x0/0x3c) [147957.848353] INFO: task jbd2/mmcblk0p1-:107 blocked for more than 120 seconds. [147957.855792] Not tainted 4.4.0-1-armmp #1 [147957.860475] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [147957.868642] jbd2/mmcblk0p1- D c0663814 0 107 2 0x00000000 [147957.875442] [<c0663814>] (__schedule) from [<c0663d44>] (schedule+0x50/0xa8) [147957.882780] [<c0663d44>] (schedule) from [<c0666e84>] (schedule_timeout+0x208/0x290) [147957.890846] [<c0666e84>] (schedule_timeout) from [<c0663550>] (io_schedule_timeout+0xc0/0x138) [147957.899757] [<c0663550>] (io_schedule_timeout) from [<c0664688>] (bit_wait_io+0x20/0x6c) [147957.908153] [<c0664688>] (bit_wait_io) from [<c0664234>] (__wait_on_bit+0x94/0xd0) [147957.916038] [<c0664234>] (__wait_on_bit) from [<c06642fc>] (out_of_line_wait_on_bit+0x8c/0xa8) [147957.924934] [<c06642fc>] (out_of_line_wait_on_bit) from [<c01ed508>] (__wait_on_buffer+0x40/0x44) [147957.934222] [<c01ed508>] (__wait_on_buffer) from [<bf0dea70>] (jbd2_journal_commit_transaction+0xcec/0x1760 [jbd2]) [147957.945065] [<bf0dea70>] (jbd2_journal_commit_transaction [jbd2]) from [<bf0e53a8>] (kjournald2+0x108/0x2d0 [jbd2]) [147957.955886] [<bf0e53a8>] (kjournald2 [jbd2]) from [<c0072928>] (kthread+0xfc/0x114) [147957.963900] [<c0072928>] (kthread) from [<c0011e18>] (ret_from_fork+0x14/0x3c) [147957.971453] INFO: task openvpn:795 blocked for more than 120 seconds. [147957.978144] Not tainted 4.4.0-1-armmp #1 [147957.982813] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [147957.990889] openvpn D c0663814 0 795 1 0x00000000 [147957.997620] [<c0663814>] (__schedule) from [<c0663d44>] (schedule+0x50/0xa8) [147958.004923] [<c0663d44>] (schedule) from [<c0666e84>] (schedule_timeout+0x208/0x290) [147958.012963] [<c0666e84>] (schedule_timeout) from [<c0663550>] (io_schedule_timeout+0xc0/0x138) [147958.021855] [<c0663550>] (io_schedule_timeout) from [<c0664688>] (bit_wait_io+0x20/0x6c) [147958.030227] [<c0664688>] (bit_wait_io) from [<c0664234>] (__wait_on_bit+0x94/0xd0) [147958.038079] [<c0664234>] (__wait_on_bit) from [<c06642fc>] (out_of_line_wait_on_bit+0x8c/0xa8) [147958.047116] [<c06642fc>] (out_of_line_wait_on_bit) from [<bf0dc0f0>] (do_get_write_access+0x26c/0x4ac [jbd2]) [147958.057467] [<bf0dc0f0>] (do_get_write_access [jbd2]) from [<bf0dc378>] (jbd2_journal_get_write_access+0x48/0x74 [jbd2]) [147958.069078] [<bf0dc378>] (jbd2_journal_get_write_access [jbd2]) from [<bf14a034>] (__ext4_journal_get_write_access+0x44/0x84 [ext4]) [147958.081860] [<bf14a034>] (__ext4_journal_get_write_access [ext4]) from [<bf1243a0>] (ext4_orphan_add+0xc8/0x238 [ext4]) [147958.093434] [<bf1243a0>] (ext4_orphan_add [ext4]) from [<bf11a198>] (ext4_truncate+0x228/0x488 [ext4]) [147958.103565] [<bf11a198>] (ext4_truncate [ext4]) from [<bf11c354>] (ext4_setattr+0x5a0/0x854 [ext4]) [147958.113174] [<bf11c354>] (ext4_setattr [ext4]) from [<c01d30e4>] (notify_change+0x1e4/0x360) [147958.121972] [<c01d30e4>] (notify_change) from [<c01b2810>] (do_truncate+0xa0/0xe0) [147958.129710] [<c01b2810>] (do_truncate) from [<c01b2c5c>] (do_sys_ftruncate+0x138/0x1b0) [147958.137877] [<c01b2c5c>] (do_sys_ftruncate) from [<c01b3880>] (SyS_ftruncate+0x2c/0x34) [147958.146034] [<c01b3880>] (SyS_ftruncate) from [<c0011d60>] (ret_fast_syscall+0x0/0x3c) [147958.154141] INFO: task uwsgi:1116 blocked for more than 120 seconds. [147958.160631] Not tainted 4.4.0-1-armmp #1 [147958.165206] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [147958.173166] uwsgi D c0663814 0 1116 1 0x00000000 [147958.179698] [<c0663814>] (__schedule) from [<c0663d44>] (schedule+0x50/0xa8) [147958.186880] [<c0663d44>] (schedule) from [<c06640d0>] (schedule_preempt_disabled+0x18/0x1c) [147958.195382] [<c06640d0>] (schedule_preempt_disabled) from [<c0665968>] (__mutex_lock_slowpath+0xc4/0x17c) [147958.205089] [<c0665968>] (__mutex_lock_slowpath) from [<c0665a80>] (mutex_lock+0x60/0x64) [147958.213428] [<c0665a80>] (mutex_lock) from [<c01d4bc8>] (__fdget_pos+0x4c/0x54) [147958.220890] [<c01d4bc8>] (__fdget_pos) from [<c01b52d0>] (SyS_llseek+0x3c/0x134) [147958.228446] [<c01b52d0>] (SyS_llseek) from [<c0011d60>] (ret_fast_syscall+0x0/0x3c) [147958.236244] INFO: task uwsgi:1216 blocked for more than 120 seconds. [147958.242736] Not tainted 4.4.0-1-armmp #1 [147958.247302] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [147958.255267] uwsgi D c0663814 0 1216 1116 0x00000000 [147958.261807] [<c0663814>] (__schedule) from [<c0663d44>] (schedule+0x50/0xa8) [147958.268985] [<c0663d44>] (schedule) from [<c0666e84>] (schedule_timeout+0x208/0x290) [147958.276878] [<c0666e84>] (schedule_timeout) from [<c0663550>] (io_schedule_timeout+0xc0/0x138) [147958.285626] [<c0663550>] (io_schedule_timeout) from [<c0664688>] (bit_wait_io+0x20/0x6c) [147958.293864] [<c0664688>] (bit_wait_io) from [<c0664234>] (__wait_on_bit+0x94/0xd0) [147958.301572] [<c0664234>] (__wait_on_bit) from [<c06642fc>] (out_of_line_wait_on_bit+0x8c/0xa8) [147958.310375] [<c06642fc>] (out_of_line_wait_on_bit) from [<bf0dc0f0>] (do_get_write_access+0x26c/0x4ac [jbd2]) [147958.320470] [<bf0dc0f0>] (do_get_write_access [jbd2]) from [<bf0dc378>] (jbd2_journal_get_write_access+0x48/0x74 [jbd2]) [147958.331702] [<bf0dc378>] (jbd2_journal_get_write_access [jbd2]) from [<bf14a034>] (__ext4_journal_get_write_access+0x44/0x84 [ext4]) [147958.343951] [<bf14a034>] (__ext4_journal_get_write_access [ext4]) from [<bf11881c>] (ext4_reserve_inode_write+0x70/0x98 [ext4]) [147958.355780] [<bf11881c>] (ext4_reserve_inode_write [ext4]) from [<bf118898>] (ext4_mark_inode_dirty+0x54/0x248 [ext4]) [147958.366804] [<bf118898>] (ext4_mark_inode_dirty [ext4]) from [<bf11c65c>] (ext4_dirty_inode+0x54/0x70 [ext4]) [147958.377001] [<bf11c65c>] (ext4_dirty_inode [ext4]) from [<c01e56b8>] (__mark_inode_dirty+0x54/0x418) [147958.386305] [<c01e56b8>] (__mark_inode_dirty) from [<c01d104c>] (generic_update_time+0x84/0xc0) [147958.395172] [<c01d104c>] (generic_update_time) from [<c01d11f4>] (file_update_time+0xe8/0x140) [147958.403949] [<c01d11f4>] (file_update_time) from [<c015a93c>] (__generic_file_write_iter+0xa0/0x1fc) [147958.413388] [<c015a93c>] (__generic_file_write_iter) from [<bf10ee4c>] (ext4_file_write_iter+0x2b8/0x598 [ext4]) [147958.423828] [<bf10ee4c>] (ext4_file_write_iter [ext4]) from [<c01b5958>] (do_readv_writev+0x154/0x2ec) [147958.433319] [<c01b5958>] (do_readv_writev) from [<c01b5ba0>] (vfs_writev+0x4c/0x64) [147958.441133] [<c01b5ba0>] (vfs_writev) from [<c01b65d8>] (SyS_writev+0x74/0x118) [147958.448598] [<c01b65d8>] (SyS_writev) from [<c0011d60>] (ret_fast_syscall+0x0/0x3c) [147958.456414] INFO: task kworker/u4:0:6029 blocked for more than 120 seconds. [147958.463545] Not tainted 4.4.0-1-armmp #1 [147958.468124] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [147958.476097] kworker/u4:0 D c0663814 0 6029 2 0x00000000 [147958.482643] Workqueue: writeback wb_workfn (flush-179:0) [147958.488129] [<c0663814>] (__schedule) from [<c0663d44>] (schedule+0x50/0xa8) [147958.495328] [<c0663d44>] (schedule) from [<c0666e84>] (schedule_timeout+0x208/0x290) [147958.503242] [<c0666e84>] (schedule_timeout) from [<c0663550>] (io_schedule_timeout+0xc0/0x138) [147958.512000] [<c0663550>] (io_schedule_timeout) from [<c0664688>] (bit_wait_io+0x20/0x6c) [147958.520247] [<c0664688>] (bit_wait_io) from [<c0664234>] (__wait_on_bit+0x94/0xd0) [147958.527956] [<c0664234>] (__wait_on_bit) from [<c06642fc>] (out_of_line_wait_on_bit+0x8c/0xa8) [147958.536757] [<c06642fc>] (out_of_line_wait_on_bit) from [<bf0dc0f0>] (do_get_write_access+0x26c/0x4ac [jbd2]) [147958.546850] [<bf0dc0f0>] (do_get_write_access [jbd2]) from [<bf0dc378>] (jbd2_journal_get_write_access+0x48/0x74 [jbd2]) [147958.558061] [<bf0dc378>] (jbd2_journal_get_write_access [jbd2]) from [<bf14a034>] (__ext4_journal_get_write_access+0x44/0x84 [ext4]) [147958.570319] [<bf14a034>] (__ext4_journal_get_write_access [ext4]) from [<bf151f5c>] (ext4_mb_mark_diskspace_used+0xf4/0x50c [ext4]) [147958.582498] [<bf151f5c>] (ext4_mb_mark_diskspace_used [ext4]) from [<bf153674>] (ext4_mb_new_blocks+0x3a0/0x6b0 [ext4]) [147958.593603] [<bf153674>] (ext4_mb_new_blocks [ext4]) from [<bf146a70>] (ext4_ext_map_blocks+0x8d0/0x13c8 [ext4]) [147958.604130] [<bf146a70>] (ext4_ext_map_blocks [ext4]) from [<bf115a34>] (ext4_map_blocks+0x13c/0x4f4 [ext4]) [147958.614302] [<bf115a34>] (ext4_map_blocks [ext4]) from [<bf118b90>] (mpage_map_and_submit_extent+0x104/0x870 [ext4]) [147958.625172] [<bf118b90>] (mpage_map_and_submit_extent [ext4]) from [<bf1198b8>] (ext4_writepages+0x5bc/0x774 [ext4]) [147958.635941] [<bf1198b8>] (ext4_writepages [ext4]) from [<c0166228>] (do_writepages+0x34/0x48) [147958.644698] [<c0166228>] (do_writepages) from [<c01e5abc>] (__writeback_single_inode+0x40/0x418) [147958.653645] [<c01e5abc>] (__writeback_single_inode) from [<c01e63a4>] (writeback_sb_inodes+0x234/0x4bc) [147958.663200] [<c01e63a4>] (writeback_sb_inodes) from [<c01e66bc>] (__writeback_inodes_wb+0x90/0xd0) [147958.672304] [<c01e66bc>] (__writeback_inodes_wb) from [<c01e69a0>] (wb_writeback+0x2a4/0x390) [147958.680976] [<c01e69a0>] (wb_writeback) from [<c01e74d8>] (wb_workfn+0x41c/0x594) [147958.688601] [<c01e74d8>] (wb_workfn) from [<c006c590>] (process_one_work+0x14c/0x450) [147958.696593] [<c006c590>] (process_one_work) from [<c006c8e8>] (worker_thread+0x54/0x4f8) [147958.704862] [<c006c8e8>] (worker_thread) from [<c0072928>] (kthread+0xfc/0x114) [147958.712348] [<c0072928>] (kthread) from [<c0011e18>] (ret_from_fork+0x14/0x3c) Again, I have not figured any reliable way to trigger the issue, and some I am not sure all the freezes/crashes I experienced are attributable to this single issue. The fact I had issues with 4.4 and 4.8 seem to suggest it is a u-boot issue rather than a kernel one.