Package: linux-image-armmp
Version: 5.9.15-1
Dear Maintainer,
since upgrading to the 5.9.0-x-armmp armhf kernels I am seeing
processes getting getting randomly stuck in D-state.
I observed this issue with 5.9.0-1-armmp and 5.9.0-4-armmp.
I did not observe the issue with 5.8.0-3-armmp or any previous kernel.
Caveat: the system is running a debian kernel, but userspace is ubuntu
focal.
I thought you might want to know anyway, since it looks like a kernel
problem.
--
Hardware:
Three HDDs connected via JMB575 SataIII port multiplier to i.MX6q's
SataII port.
Probe log: ata1.15: Port Multiplier 1.2, 0x197b:0x5755 r0, 5 ports, feat
0x5/0xf
Steps to reproduce: Start a bigger filecopy operation from one disk to
another.
I can reproduce this problem copying multi-GB files either via cp or
rsync.
Copy gets blocked in D-state in few hours and a cpu core is filled by
io-wait.
I've seen rsync getting blocked both reading the source and writing
destination.
Details:
As I said, JMB575 PMP is connected to the i.MX6q on-chip SATA
controller.
All it's ports get configured "SATA link up 3.0 Gbps (SStatus 123
SControl 320)"
There are three HDDs connected to it - I am going to call them sda, sdb,
sdc.
Disks "sda" and "sdb" are partitioned and used for regular ext4
filesystems.
Disk "sdc" is backing a dm-integrity device. I am going to call it
"dm-sdc".
DM device "dm-sdc" is used for a ext4 filesystem.
Generally the blocked processes arrive via io_schedule or
io_schedule_timeout:
[<c0c9c5f0>] (io_schedule_timeout) from [<c0c9e20c>]
(wait_for_completion_io+0xb4/0x134)
[<c0c9d05c>] (schedule) from [<c0c9d1d8>] (io_schedule+0x50/0x70)
[<c0c9d188>] (io_schedule) from [<bf640a34>]
(dm_integrity_map_continue+0xb28/0xbc0 [dm_integrity])
[<c0c9d188>] (io_schedule) from [<c04eb868>]
(wait_on_page_bit_common+0x18c/0x420)
[<c0c9d188>] (io_schedule) from [<c0704924>] (rq_qos_wait+0x120/0x18c)
[<c0c9d188>] (io_schedule) from [<c0c9da68>] (bit_wait_io+0x1c/0x68)
[<c0ca11c8>] (schedule_timeout) from [<c0c9c648>]
(io_schedule_timeout+0x58/0x78)
I have many full stacktraces, but I am not sure if they are safe to
share.
I'd really like to avoid disclosing user data or crypto keys.
I managed to find one that should be safe:
task:cronjob1l.sh state:D stack: 0 pid:28125 ppid: 28124
flags:0x00000000
Backtrace:
[<c0c9c668>] (__schedule) from [<c0c9d0d4>] (schedule+0x78/0xf4)
r10:000000c3 r9:00000002 r8:00000002 r7:c0c9da4c r6:e5a27ae8
r5:e5a26000
r4:e598be00
[<c0c9d05c>] (schedule) from [<c0c9d1d8>] (io_schedule+0x50/0x70)
r5:c1304a84 r4:00000000
[<c0c9d188>] (io_schedule) from [<c0c9da68>] (bit_wait_io+0x1c/0x68)
r5:c1304a84 r4:00000002
[<c0c9da4c>] (bit_wait_io) from [<c0c9d5f8>] (__wait_on_bit+0x70/0xc8)
r5:c1304a84 r4:e5a27adc
[<c0c9d588>] (__wait_on_bit) from [<c0c9d6f4>]
(out_of_line_wait_on_bit+0xa4/0xcc)
r9:e5a27c30 r8:bf1562a8 r7:00000001 r6:00000000 r5:c5fe43a8 r4:e5a27af4
[<c0c9d650>] (out_of_line_wait_on_bit) from [<c05c6f30>]
(__wait_on_buffer+0x40/0x44)
r4:c00b3b80
[<c05c6ef0>] (__wait_on_buffer) from [<bf109de4>]
(ext4_bread+0x124/0x130 [ext4])
r5:c5fe43a8 r4:c00b3b80
[<bf109cc0>] (ext4_bread [ext4]) from [<bf123f84>]
(__ext4_read_dirblock+0x3c/0x45c [ext4])
r4:e5a27cc0
[<bf123f48>] (__ext4_read_dirblock [ext4]) from [<bf1243f8>]
(dx_probe+0x54/0x6d8 [ext4])
r10:000000c3 r9:e5a27c30 r8:00000000 r7:e5a27cc0 r6:d4a5024c
r5:c5fe43a8
r4:e5a27cc0
[<bf1243a4>] (dx_probe [ext4]) from [<bf1262e8>]
(__ext4_find_entry+0x418/0x620 [ext4])
r10:000000c3 r9:00000000 r8:00000009 r7:c5fe43a8 r6:d4a5024c
r5:e3092800
r4:e5a27cc0
[<bf125ed0>] (__ext4_find_entry [ext4]) from [<bf126c24>]
(ext4_lookup+0xe4/0x2ec [ext4])
r10:000000c3 r9:00000000 r8:00000009 r7:c5fe43a8 r6:c5fe43a8
r5:d4a50220
r4:e5a27ca0
[<bf126b40>] (ext4_lookup [ext4]) from [<c058d328>]
(__lookup_slow+0x98/0x164)
r9:00000000 r8:00000009 r7:c5fe43a8 r6:e5a27db8 r5:e40df990 r4:d4a50220
[<c058d290>] (__lookup_slow) from [<c05900b8>]
(walk_component+0x158/0x1cc)
r8:00000009 r7:c5fe4430 r6:00000001 r5:e5a27db0 r4:e40df990
[<c058ff60>] (walk_component) from [<c05905a8>]
(path_lookupat+0x80/0x1c0)
r9:00000000 r8:e5a27db0 r7:e5a27ea4 r6:00000001 r5:e5a27db0 r4:00000000
[<c0590528>] (path_lookupat) from [<c0593454>]
(filename_lookup+0xb0/0x1ac)
r7:e5a27ea4 r6:00000001 r5:e5509000 r4:00000001
[<c05933a4>] (filename_lookup) from [<c059369c>]
(user_path_at_empty+0x7c/0x98)
r8:00000800 r7:014c04c4 r6:e5a27ea4 r5:ffffff9c r4:00000001
[<c0593620>] (user_path_at_empty) from [<c0585718>]
(vfs_statx+0x7c/0x13c)
r7:014c04c4 r6:ffffff9c r5:00000001 r4:e5a27ee8
[<c058569c>] (vfs_statx) from [<c0585cf4>] (__do_sys_stat64+0x40/0x80)
r10:000000c3 r9:e5a26000 r8:c03002c4 r7:000000c3 r6:014c051c
r5:004c2384
r4:be94d758
[<c0585cb4>] (__do_sys_stat64) from [<c058610c>] (sys_stat64+0x18/0x1c)
r4:00000004
[<c05860f4>] (sys_stat64) from [<c03000c0>] (ret_fast_syscall+0x0/0x54)
Exception stack(0xe5a27fa8 to 0xe5a27ff0)
7fa0: 00000004 004c2384 014c04c4 be94d758 be94d758
00000003
7fc0: 00000004 004c2384 014c051c 000000c3 014c0518 00000003 004b0204
004c24ec
7fe0: 000000c3 be94d6e4 b6ee69db b6e6dbe6
What information would be helpful to collect next time this happens?
My attempt to document these mis-behavior follows:
1st occurence: 5.9.0-1-armmp #1 Debian 5.9.1-1
One recursive copy (cp -ar) reading from dm-sdc, writing to sdb3.
uninterruptibly sleeping / D-state'd processes:
cp -ar # no info
rest unknown # no info
kernel messages:
one for each hung task
remediation attempts:
ls /mnt/dm-sdc # un-blocked the system
note: sdc had to spin up because it was parked
2nd occurence: 5.9.0-1-armmp #1 Debian 5.9.1-1
One recursive copy (cp -ar) reading from dm-sdc, writing to sdb3.
uninterruptibly sleeping / D-state'd processes:
cp -ar # no info
rest unknown # no info
kernel messages:
NONE
remediation attempts:
ls /home/user # !! entered D-state
ls / # no problem
ls /opt # no problem
mplayer --help # no problem
mount # no problem
/usr/games/caesar # no problem
ls /tmp # no problem
ls /mnt/sdb3 # !! entered D-state
ls /mnt/sda2 # !! entered D-state
ls /mnt/dm-sdc # un-blocked the system
note: sdc had to spin up because it was parked
3rd occurence: 5.9.0-1-armmp #1 Debian 5.9.1-1
One rsync reading from sda2, writing to dm-sdc.
uninterruptibly sleeping / D-state'd processes:
rsync(s) # one or both
rsyncs?
rest unknown # no info
kernel messages:
NONE
remediation attempts:
ls /proc # no problem
ls /sys # no problem
ls /dev # no problem
ls /run # [tmpfs] no problem
ls /mnt/sdb3 # no problem
ls /mnt/sda2 # no problem
ls /home/user # no problem
ls /mnt/dm-sdc # un-blocked the system
note: sdc had to spin up because it was parked
4th occurence: 5.9.0-1-armmp #1 Debian 5.9.1-1
One rsync reading from sda2, writing to dm-sdc.
uninterruptibly sleeping / D-state'd processes:
kworker/3:13+dm-integrity-writer # dm-sdc
rsync # only one, but
which one?
kworker/u8:0+dm-integrity-wait # dm-sdc
kworker/0:0+dm-integrity-commit # dm-sdc
jbd2/dm-0-8 # dm-sdc
/lib/systemd/systemd-journald # no info
kernel messages:
NONE
remediation attempts:
ls /mnt/dm-sdc # no problem
ls /mnt/dm-sdc/folder # un-blocked the system
note: sdc had to spin up because it was parked
<kernel upgraded, system rebooted>
5th occurence: 5.9.0-4-armmp #1 Debian 5.9.11-1
One rsync -c --dry-run reading from all three disks. No writing!
uninterruptibly sleeping / D-state'd processes:
rsync(s) # no info
kworker/3:1+dm-integrity-offload # dm-sdc
kworker/3:7+dm-integrity-offload # dm-sdc
rest unknown # no futher info
kernel messages:
one for each hung task
remediation attempts:
ls /mnt/dm-sdc # !! entered D-state
uninterruptibly sleeping / D-state'd processes:
rsync(s) # no info
kworker/3:1+dm-integrity-offload # dm-sdc
kworker/3:7+dm-integrity-offload # dm-sdc
kworker/0:0+dm-integrity-offload # dm-sdc, !! NEW
rest unknown # no info
remediation attempts:
ls /home/user # no problem
sha256sum /dev/sdc # !! no help, killed by
Ctrl+C
dd if=/dev/sdc # !! entered D-state
smartctl /dev/sdc # !! entered
D-state
umount /mnt/sdb3 # failed, rsync
has a file open
hdparm -Y /dev/sda # un-blocked the system
NOTE: sdc did NOT seem parked this time
6th occurence: 5.9.0-4-armmp #1 Debian 5.9.11-1
One rsync -c --dry-run thoroughly verifying previous copies - no
writing!
uninterruptibly sleeping / D-state'd processes:
rsync - reading side # sdb3, reading 400M
file
rsync - comparing side # dm-sdc,reading 900M
file
kworker/2:16+dm-integrity-offload # dm-sdc
kworker/2:0+dm-integrity-offload # dm-sdc
jbd2/sda2-8 # sda2
/bin/sh writing to file on sda2 # sda2
/bin/sh creating a file on sda2 # sda2
kernel messages:
"failed to read SCR 1" for all PMP ports
IO errors on sda
remediation attempts:
df -h # no problem
ls /mnt/dm-sdc # !! entered D-state
ls /home/user # un-blocked some, but
not all!?
uninterruptibly sleeping / D-state'd processes:
rsync - comparing side # dm-sdc, reading 900M
file
kworker/2:16+dm-integrity-offload # dm-sdc
kworker/2:0+dm-integrity-offload # dm-sdc
kworker/3:0+dm-integrity-offload # dm-sdc
ls /mnt/dm-sdc # dm-sdc
remediation attempts:
ls /mnt/sdb3/path1 # no problem
ls /mnt/sdb3/path2 # no problem
md5sum /mnt/sdb3/file1 # no problem
md5sum /mnt/sdb3/file2 # no problem
echo x >/proc/sysrq-trigger # no problem
kernel messages:
one for each hung task
remediation attempts:
# none, system unblocked itself after a while!
kernel messages:
"failed to read SCR 1" for all PMP ports
IO errors on sda
<system rebooted>
7th occurence: 5.9.0-4-armmp #1 Debian 5.9.11-1
One rsync reading from sda2, writing to dm-sdc.
uninterruptibly sleeping / D-state'd processes:
jbd2/dm-0-8 # dm-sdc
rsync - writing side # dm-sdc, writing >100G
file
kworker/2:0+dm-integrity-writer # dm-sdc
kworker/0:0+dm-integrity-commit # dm-sdc
kworker/u8:2+flush-254:0 # dm-0 -> dm-sdc
kworker/u8:1+dm-integrity-wait # dm-sdc
kworker/3:1+dm_bufio_cache # dm-sdc
kworker/0:3+dm_bufio_cache # dm-sdc
kernel messages:
one for each hung task
remediation attempts:
df -h # no problem
ls /mnt/dm-sdc # un-blocked the system
note: sdc had to spin up because it was parked
8th occurence: 5.9.0-4-armmp #1 Debian 5.9.11-1
Happened just minutes after "solving" #7
uninterruptibly sleeping / D-state'd processes:
jbd2/dm-0-8 # dm-sdc
631/rsync - writing side # dm-sdc
kworker/1:3+dm-integrity-writer # dm-sdc
kworker/3:2+dm-integrity-commit # dm-sdc
kworker/u8:2+flush-254:0 # dm-0 -> dm-sdc
kworker/u8:1+dm-integrity-wait # dm-sdc
kernel messages:
NONE
remediation attempts:
df -h # no problem
ls /mnt/dm-sdc # no problem
ls /mnt/dm-sdc/dir1 # no problem
ls /mnt/dm-sdc/dir1/d2/ # un-blocked the system
note: not sure if sdc was spinning up or not
9th occurence: 5.9.0-4-armmp #1 Debian 5.9.11-1
Happened within tens of minutes after solving #8 (rsyncing the same
file!)
uninterruptibly sleeping / D-state'd processes:
629/rsync - reading side # sda2
kworker/1:0+dm-integrity-commit # dm-sdc
kworker/1:1+dm-integrity-writer # dm-sdc
jbd2/dm-0-8 # dm-sdc
kworker/0:0+dm_bufio_cache # dm-sdc
/bin/sh - cronjob1 writing a file # ?
/bin/sh - cronjob2 writing a file # ?
kernel messages:
NONE
remediation attempts:
df -h # no problem
ls /mnt/dm-sdc # no problem
ls /mnt/dm-sdc/dir1 # no problem
ls /mnt/dm-sdc/dir1/d2/ # no problem
ls /mnt/dm-sdc/dir1/d2/d3/d4/ # un-blocked the system
note: not sure if sdc was spinning up or not
10th, 11th, 12th occurence: 5.9.0-4-armmp #1 Debian 5.9.11-1
Still rsyncing that the same >100G file
always remediated by 'find /mnt/sda & find /mnt/sdb & find /mnt/dm-sdc
&'
13th occurence: 5.9.0-4-armmp #1 Debian 5.9.11-1
Still rsyncing that one >100G file
uninterruptibly sleeping / D-state'd processes:
629/rsync - reading side # sda2
jbd2/dm-0-8 # dm-sdc
kworker/2:15+dm-integrity-commit # dm-sdc
kworker/2:8+dm-integrity-writer # dm-sdc
jbd2/sda2-8 # sda2
kworker/u8:1+flush-8:0 # ?
/bin/sh - cronjob0 writing a file # started from sda2
/bin/sh - cronjob0 writing a file # started from sda2
The blocked shellscripts are odd. Their open FDs look like this:
0 -> pipe:[7750528]
1 -> /tmp/#713 (deleted)
10 -> /mnt/sda2/dir/cronjob0.sh
2 -> /tmp/#713 (deleted)
kernel messages:
one for each hung task
remediation attempts:
find /mnt/sda2 # !! entered D-state
find /mnt/sdb3 # !! entered D-state
find /mnt/dm-sdc # un-blocked
the system
note: sdc had to spin up because it was parked
13th occurence: 5.9.0-4-armmp #1 Debian 5.9.11-1
Still rsyncing that one >100G file
uninterruptibly sleeping / D-state'd processes:
629/rsync - reading side # sda2
jbd2/dm-0-8 # dm-sdc
kworker/2:0+dm-integrity-commit # dm-sdc
kworker/3:3+dm-integrity-writer # dm-sdc
kernel messages:
one for each hung task
remediation attempts:
find /mnt/sda2 # !! entered D-state
find /mnt/sdb3 # !! entered D-state
find /mnt/dm-sdc # un-blocked
the system
note: not sure if sdc was spinning up or not
Cheers,
n-b-f