Package: linux-image-armmp
Version: 5.9.15-1

Dear Maintainer,
since upgrading to the 5.9.0-x-armmp armhf kernels I am seeing
processes getting getting randomly stuck in D-state.
I observed this issue with 5.9.0-1-armmp and 5.9.0-4-armmp.
I did not observe the issue with 5.8.0-3-armmp or any previous kernel.

Caveat: the system is running a debian kernel, but userspace is ubuntu focal. I thought you might want to know anyway, since it looks like a kernel problem.

--

Hardware:
Three HDDs connected via JMB575 SataIII port multiplier to i.MX6q's SataII port. Probe log: ata1.15: Port Multiplier 1.2, 0x197b:0x5755 r0, 5 ports, feat 0x5/0xf

Steps to reproduce: Start a bigger filecopy operation from one disk to another. I can reproduce this problem copying multi-GB files either via cp or rsync. Copy gets blocked in D-state in few hours and a cpu core is filled by io-wait. I've seen rsync getting blocked both reading the source and writing destination.

Details:
As I said, JMB575 PMP is connected to the i.MX6q on-chip SATA controller. All it's ports get configured "SATA link up 3.0 Gbps (SStatus 123 SControl 320)" There are three HDDs connected to it - I am going to call them sda, sdb, sdc. Disks "sda" and "sdb" are partitioned and used for regular ext4 filesystems. Disk "sdc" is backing a dm-integrity device. I am going to call it "dm-sdc".
DM device "dm-sdc" is used for a ext4 filesystem.


Generally the blocked processes arrive via io_schedule or io_schedule_timeout: [<c0c9c5f0>] (io_schedule_timeout) from [<c0c9e20c>] (wait_for_completion_io+0xb4/0x134)
 [<c0c9d05c>] (schedule) from [<c0c9d1d8>] (io_schedule+0x50/0x70)
[<c0c9d188>] (io_schedule) from [<bf640a34>] (dm_integrity_map_continue+0xb28/0xbc0 [dm_integrity]) [<c0c9d188>] (io_schedule) from [<c04eb868>] (wait_on_page_bit_common+0x18c/0x420)
 [<c0c9d188>] (io_schedule) from [<c0704924>] (rq_qos_wait+0x120/0x18c)
 [<c0c9d188>] (io_schedule) from [<c0c9da68>] (bit_wait_io+0x1c/0x68)
[<c0ca11c8>] (schedule_timeout) from [<c0c9c648>] (io_schedule_timeout+0x58/0x78)

I have many full stacktraces, but I am not sure if they are safe to share.
I'd really like to avoid disclosing user data or crypto keys.
I managed to find one that should be safe:
task:cronjob1l.sh state:D stack: 0 pid:28125 ppid: 28124 flags:0x00000000
Backtrace:
[<c0c9c668>] (__schedule) from [<c0c9d0d4>] (schedule+0x78/0xf4)
r10:000000c3 r9:00000002 r8:00000002 r7:c0c9da4c r6:e5a27ae8 r5:e5a26000
 r4:e598be00
[<c0c9d05c>] (schedule) from [<c0c9d1d8>] (io_schedule+0x50/0x70)
 r5:c1304a84 r4:00000000
[<c0c9d188>] (io_schedule) from [<c0c9da68>] (bit_wait_io+0x1c/0x68)
 r5:c1304a84 r4:00000002
[<c0c9da4c>] (bit_wait_io) from [<c0c9d5f8>] (__wait_on_bit+0x70/0xc8)
 r5:c1304a84 r4:e5a27adc
[<c0c9d588>] (__wait_on_bit) from [<c0c9d6f4>] (out_of_line_wait_on_bit+0xa4/0xcc)
 r9:e5a27c30 r8:bf1562a8 r7:00000001 r6:00000000 r5:c5fe43a8 r4:e5a27af4
[<c0c9d650>] (out_of_line_wait_on_bit) from [<c05c6f30>] (__wait_on_buffer+0x40/0x44)
 r4:c00b3b80
[<c05c6ef0>] (__wait_on_buffer) from [<bf109de4>] (ext4_bread+0x124/0x130 [ext4])
 r5:c5fe43a8 r4:c00b3b80
[<bf109cc0>] (ext4_bread [ext4]) from [<bf123f84>] (__ext4_read_dirblock+0x3c/0x45c [ext4])
 r4:e5a27cc0
[<bf123f48>] (__ext4_read_dirblock [ext4]) from [<bf1243f8>] (dx_probe+0x54/0x6d8 [ext4]) r10:000000c3 r9:e5a27c30 r8:00000000 r7:e5a27cc0 r6:d4a5024c r5:c5fe43a8
 r4:e5a27cc0
[<bf1243a4>] (dx_probe [ext4]) from [<bf1262e8>] (__ext4_find_entry+0x418/0x620 [ext4]) r10:000000c3 r9:00000000 r8:00000009 r7:c5fe43a8 r6:d4a5024c r5:e3092800
 r4:e5a27cc0
[<bf125ed0>] (__ext4_find_entry [ext4]) from [<bf126c24>] (ext4_lookup+0xe4/0x2ec [ext4]) r10:000000c3 r9:00000000 r8:00000009 r7:c5fe43a8 r6:c5fe43a8 r5:d4a50220
 r4:e5a27ca0
[<bf126b40>] (ext4_lookup [ext4]) from [<c058d328>] (__lookup_slow+0x98/0x164)
 r9:00000000 r8:00000009 r7:c5fe43a8 r6:e5a27db8 r5:e40df990 r4:d4a50220
[<c058d290>] (__lookup_slow) from [<c05900b8>] (walk_component+0x158/0x1cc)
 r8:00000009 r7:c5fe4430 r6:00000001 r5:e5a27db0 r4:e40df990
[<c058ff60>] (walk_component) from [<c05905a8>] (path_lookupat+0x80/0x1c0)
 r9:00000000 r8:e5a27db0 r7:e5a27ea4 r6:00000001 r5:e5a27db0 r4:00000000
[<c0590528>] (path_lookupat) from [<c0593454>] (filename_lookup+0xb0/0x1ac)
 r7:e5a27ea4 r6:00000001 r5:e5509000 r4:00000001
[<c05933a4>] (filename_lookup) from [<c059369c>] (user_path_at_empty+0x7c/0x98)
 r8:00000800 r7:014c04c4 r6:e5a27ea4 r5:ffffff9c r4:00000001
[<c0593620>] (user_path_at_empty) from [<c0585718>] (vfs_statx+0x7c/0x13c)
 r7:014c04c4 r6:ffffff9c r5:00000001 r4:e5a27ee8
[<c058569c>] (vfs_statx) from [<c0585cf4>] (__do_sys_stat64+0x40/0x80)
r10:000000c3 r9:e5a26000 r8:c03002c4 r7:000000c3 r6:014c051c r5:004c2384
 r4:be94d758
[<c0585cb4>] (__do_sys_stat64) from [<c058610c>] (sys_stat64+0x18/0x1c)
 r4:00000004
[<c05860f4>] (sys_stat64) from [<c03000c0>] (ret_fast_syscall+0x0/0x54)
Exception stack(0xe5a27fa8 to 0xe5a27ff0)
7fa0: 00000004 004c2384 014c04c4 be94d758 be94d758 00000003 7fc0: 00000004 004c2384 014c051c 000000c3 014c0518 00000003 004b0204 004c24ec
7fe0: 000000c3 be94d6e4 b6ee69db b6e6dbe6


What information would be helpful to collect next time this happens?
My attempt to document these mis-behavior follows:

1st occurence:  5.9.0-1-armmp #1 Debian 5.9.1-1
        One recursive copy (cp -ar) reading from dm-sdc, writing to sdb3.
        uninterruptibly sleeping / D-state'd processes:
                cp -ar                                  # no info
                rest unknown                            # no info
        kernel messages:
                one for each hung task
        remediation attempts:
                ls /mnt/dm-sdc                          # un-blocked the system
                note: sdc had to spin up because it was parked

2nd occurence:  5.9.0-1-armmp #1 Debian 5.9.1-1
        One recursive copy (cp -ar) reading from dm-sdc, writing to sdb3.
        uninterruptibly sleeping / D-state'd processes:
                cp -ar                                  # no info
                rest unknown                            # no info
        kernel messages:
                NONE
        remediation attempts:
                ls /home/user                           # !! entered D-state
                ls /                                            # no problem
                ls /opt                                 # no problem
                mplayer --help                          # no problem
                mount                                           # no problem
                /usr/games/caesar                               # no problem
                ls /tmp                                 # no problem
                ls /mnt/sdb3                            # !! entered D-state
                ls /mnt/sda2                            # !! entered D-state
                ls /mnt/dm-sdc                          # un-blocked the system
                note: sdc had to spin up because it was parked

3rd occurence:  5.9.0-1-armmp #1 Debian 5.9.1-1
        One rsync reading from sda2, writing to dm-sdc.
        uninterruptibly sleeping / D-state'd processes:
                rsync(s)                                        # one or both 
rsyncs?
                rest unknown                            # no info
        kernel messages:
                NONE
        remediation attempts:
                ls /proc                                        # no problem
                ls /sys                                 # no problem
                ls /dev                                 # no problem
                ls /run                                 # [tmpfs] no problem
                ls /mnt/sdb3                            # no problem
                ls /mnt/sda2                            # no problem
                ls /home/user                           # no problem
                ls /mnt/dm-sdc                          # un-blocked the system
                note: sdc had to spin up because it was parked

4th occurence:  5.9.0-1-armmp #1 Debian 5.9.1-1
        One rsync reading from sda2, writing to dm-sdc.
        uninterruptibly sleeping / D-state'd processes:
                kworker/3:13+dm-integrity-writer        # dm-sdc
                rsync                                           # only one, but 
which one?
                kworker/u8:0+dm-integrity-wait  # dm-sdc
                kworker/0:0+dm-integrity-commit # dm-sdc
                jbd2/dm-0-8                                     # dm-sdc
                /lib/systemd/systemd-journald           # no info
        kernel messages:
                NONE
        remediation attempts:
                ls /mnt/dm-sdc                          # no problem
                ls /mnt/dm-sdc/folder                   # un-blocked the system
                note: sdc had to spin up because it was parked

<kernel upgraded, system rebooted>

5th occurence:  5.9.0-4-armmp #1 Debian 5.9.11-1
        One rsync -c --dry-run reading from all three disks. No writing!
        uninterruptibly sleeping / D-state'd processes:
                rsync(s)                                        # no info
                kworker/3:1+dm-integrity-offload        # dm-sdc
                kworker/3:7+dm-integrity-offload        # dm-sdc
                rest unknown                            # no futher info
        kernel messages:
                one for each hung task
        remediation attempts:
                ls /mnt/dm-sdc                          # !! entered D-state
        uninterruptibly sleeping / D-state'd processes:
                rsync(s)                                        # no info
                kworker/3:1+dm-integrity-offload        # dm-sdc
                kworker/3:7+dm-integrity-offload        # dm-sdc
                kworker/0:0+dm-integrity-offload        # dm-sdc, !! NEW
                rest unknown                            # no info
        remediation attempts:
                ls /home/user                           # no problem
                sha256sum /dev/sdc                      # !! no help, killed by 
Ctrl+C
                dd if=/dev/sdc                          # !! entered D-state
                smartctl /dev/sdc                               # !! entered 
D-state
                umount /mnt/sdb3                                # failed, rsync 
has a file open
                hdparm -Y /dev/sda                      # un-blocked the system
                NOTE: sdc did NOT seem parked this time

6th occurence:  5.9.0-4-armmp #1 Debian 5.9.11-1
One rsync -c --dry-run thoroughly verifying previous copies - no writing!
        uninterruptibly sleeping / D-state'd processes:
                rsync - reading side                    # sdb3, reading 400M 
file
                rsync - comparing side                  # dm-sdc,reading 900M 
file
                kworker/2:16+dm-integrity-offload       # dm-sdc
                kworker/2:0+dm-integrity-offload        # dm-sdc
                jbd2/sda2-8                                     # sda2
                /bin/sh writing to file on sda2 # sda2
                /bin/sh creating a file on sda2 # sda2
        kernel messages:
                "failed to read SCR 1" for all PMP ports
                IO errors on sda
        remediation attempts:
                df -h                                           # no problem
                ls /mnt/dm-sdc                          # !! entered D-state
                ls /home/user                           # un-blocked some, but 
not all!?
        uninterruptibly sleeping / D-state'd processes:
                rsync - comparing side                  # dm-sdc, reading 900M 
file
                kworker/2:16+dm-integrity-offload       # dm-sdc
                kworker/2:0+dm-integrity-offload        # dm-sdc
                kworker/3:0+dm-integrity-offload        # dm-sdc
                ls /mnt/dm-sdc                          # dm-sdc
        remediation attempts:
                ls /mnt/sdb3/path1                      # no problem
                ls /mnt/sdb3/path2                      # no problem
                md5sum /mnt/sdb3/file1                  # no problem
                md5sum /mnt/sdb3/file2                  # no problem
                echo x >/proc/sysrq-trigger          # no problem
        kernel messages:
                one for each hung task
        remediation attempts:
                # none, system unblocked itself after a while!
        kernel messages:
                "failed to read SCR 1" for all PMP ports
                IO errors on sda

<system rebooted>

7th occurence:  5.9.0-4-armmp #1 Debian 5.9.11-1
        One rsync reading from sda2, writing to dm-sdc.
        uninterruptibly sleeping / D-state'd processes:
                jbd2/dm-0-8                                     # dm-sdc
                rsync - writing side                    # dm-sdc, writing >100G 
file
                kworker/2:0+dm-integrity-writer # dm-sdc
                kworker/0:0+dm-integrity-commit # dm-sdc
                kworker/u8:2+flush-254:0                # dm-0 -> dm-sdc
                kworker/u8:1+dm-integrity-wait  # dm-sdc
                kworker/3:1+dm_bufio_cache              # dm-sdc
                kworker/0:3+dm_bufio_cache              # dm-sdc
        kernel messages:
                one for each hung task
        remediation attempts:
                df -h                                           # no problem
                ls /mnt/dm-sdc                          # un-blocked the system
                note: sdc had to spin up because it was parked

8th occurence:  5.9.0-4-armmp #1 Debian 5.9.11-1
        Happened just minutes after "solving" #7
        uninterruptibly sleeping / D-state'd processes:
                jbd2/dm-0-8                                     # dm-sdc
                631/rsync - writing side                # dm-sdc
                kworker/1:3+dm-integrity-writer # dm-sdc
                kworker/3:2+dm-integrity-commit # dm-sdc
                kworker/u8:2+flush-254:0                # dm-0 -> dm-sdc
                kworker/u8:1+dm-integrity-wait  # dm-sdc
        kernel messages:
                NONE
        remediation attempts:
                df -h                                           # no problem
                ls /mnt/dm-sdc                          # no problem
                ls /mnt/dm-sdc/dir1                     # no problem
                ls /mnt/dm-sdc/dir1/d2/                 # un-blocked the system
                note: not sure if sdc was spinning up or not

9th occurence:  5.9.0-4-armmp #1 Debian 5.9.11-1
Happened within tens of minutes after solving #8 (rsyncing the same file!)
        uninterruptibly sleeping / D-state'd processes:
                629/rsync - reading side                # sda2
                kworker/1:0+dm-integrity-commit # dm-sdc
                kworker/1:1+dm-integrity-writer # dm-sdc
                jbd2/dm-0-8                                     # dm-sdc
                kworker/0:0+dm_bufio_cache              # dm-sdc
                /bin/sh - cronjob1 writing a file       # ?
                /bin/sh - cronjob2 writing a file       # ?
        kernel messages:
                NONE
        remediation attempts:
                df -h                                           # no problem
                ls /mnt/dm-sdc                          # no problem
                ls /mnt/dm-sdc/dir1                     # no problem
                ls /mnt/dm-sdc/dir1/d2/                 # no problem
                ls /mnt/dm-sdc/dir1/d2/d3/d4/           # un-blocked the system
                note: not sure if sdc was spinning up or not

10th, 11th, 12th occurence: 5.9.0-4-armmp #1 Debian 5.9.11-1
        Still rsyncing that the same >100G file
always remediated by 'find /mnt/sda & find /mnt/sdb & find /mnt/dm-sdc &'

13th occurence: 5.9.0-4-armmp #1 Debian 5.9.11-1
        Still rsyncing that one >100G file
        uninterruptibly sleeping / D-state'd processes:
                629/rsync - reading side                # sda2
                jbd2/dm-0-8                                     # dm-sdc
                kworker/2:15+dm-integrity-commit        # dm-sdc
                kworker/2:8+dm-integrity-writer # dm-sdc
                jbd2/sda2-8                                     # sda2
                kworker/u8:1+flush-8:0                  # ?
                /bin/sh - cronjob0 writing a file       # started from sda2
                /bin/sh - cronjob0 writing a file       # started from sda2
        The blocked shellscripts are odd. Their open FDs look like this:
                0 -> pipe:[7750528]
                1 -> /tmp/#713 (deleted)
                10 -> /mnt/sda2/dir/cronjob0.sh
                2 -> /tmp/#713 (deleted)
        kernel messages:
                one for each hung task
        remediation attempts:
                find /mnt/sda2                          # !! entered D-state
                find /mnt/sdb3                          # !! entered D-state
                find /mnt/dm-sdc                                # un-blocked 
the system
                note: sdc had to spin up because it was parked

13th occurence: 5.9.0-4-armmp #1 Debian 5.9.11-1
        Still rsyncing that one >100G file
        uninterruptibly sleeping / D-state'd processes:
                629/rsync - reading side                # sda2
                jbd2/dm-0-8                                     # dm-sdc
                kworker/2:0+dm-integrity-commit # dm-sdc
                kworker/3:3+dm-integrity-writer # dm-sdc
        kernel messages:
                one for each hung task
        remediation attempts:
                find /mnt/sda2                          # !! entered D-state
                find /mnt/sdb3                          # !! entered D-state
                find /mnt/dm-sdc                                # un-blocked 
the system
                note: not sure if sdc was spinning up or not

Cheers,
n-b-f

Reply via email to