On 2025-05-05 18:02:37, Salvatore Bonaccorso wrote:
> On Mon, May 05, 2025 at 04:00:31PM +0200, Salvatore Bonaccorso wrote:
>> Hi Moritz,
>> 
>> On Mon, May 05, 2025 at 01:47:15PM +0200, Moritz Mühlenhoff wrote:
>> > Am Wed, Apr 30, 2025 at 05:55:20PM +0200 schrieb Salvatore Bonaccorso:
>> > > Hi
>> > > 
>> > > We got a regression report in Debian after the update from 6.1.133 to
>> > > 6.1.135. Melvin is reporting that discard/trimm trhough a RAID10 array
>> > > stalls idefintively. The full report is inlined below and originates
>> > > from https://bugs.debian.org/1104460 .
>> > 
>> > JFTR, we ran into the same problem with a few Wikimedia servers running
>> > 6.1.135 and RAID 10: The servers started to lock up once fstrim.service
>> > got started. Full oops messages are available at
>> > https://phabricator.wikimedia.org/P75746
>> 
>> Thanks for this aditional datapoints. Assuming you wont be able to
>> thest the other stable series where the commit d05af90d6218
>> ("md/raid10: fix missing discard IO accounting") went in, might you at
>> least be able to test the 6.1.y branch with the commit reverted again
>> and manually trigger the issue?
>> 
>> If needed I can provide a test Debian package of 6.1.135 (or 6.1.137)
>> with the patch reverted. 
>
> So one additional data point as several Debian users were reporting
> back beeing affected: One user did upgrade to 6.12.25 (where the
> commit was backported as well) and is not able to reproduce the issue
> there.

That would be me.

I can reproduce the issue as outlined by Moritz above fairly reliably in
6.1.135 (debian package 6.1.0-34-amd64). The reproducer is simple, on a
RAID-10 host:

 1. reboot
 2. systemctl start fstrim.service

We're tracking the issue internally in:

https://gitlab.torproject.org/tpo/tpa/team/-/issues/42146

I've managed to workaround the issue by upgrading to the Debian package
from testing/unstable (6.12.25), as Salvatore indicated above. There,
fstrim doesn't cause any crash and completes successfully. In stable, it
just hangs there forever. The kernel doesn't completely panic and the
machine is otherwise somewhat still functional: my existing SSH
connection keeps working, for example, but new ones fail. And an `apt
install` of another kernel hangs forever.

> This indicates we might miss some pre-requisites in the 6.1.y series?
>
> user is trying now the 6.1.135 with patch reverted as well.

I am embarrassed to say I couldn't figure out how to build a Debian
package of the Linux kernel at the moment. I would be happy to test a
built package, that said. I got stock in various snags: the
`debian/bin/test-patches` script seem to require a flavor (worked around
with `-f amd64`) and in the end the build failed with:

[...]

  ld -r -m elf_x86_64 -z noexecstack --no-warn-rwx-segments --build-id=sha1  -T 
scripts/module.lds -o virt/lib/irqbypass.ko virt/lib/irqbypass.o 
virt/lib/irqbypass.mod.o;  true
debian/bin/buildcheck.py debian/build/build_amd64_none_amd64 amd64 none amd64
Can't read ABI reference.  ABI not checked!
make[2]: Leaving directory '/home/anarcat/dist/linux-6.1.135'
/usr/bin/make -f debian/rules.real build_kbuild ABINAME='6.1.0-0.a.test' 
ARCH='amd64' DESTDIR='/home/anarcat/dist/linux-6.1.135/debian/linux-kbuild-6.1' 
DH_OPTIONS='-plinux-kbuild-6.1' KERNEL_ARCH='x86' 
PACKAGE_NAME='linux-kbuild-6.1' SOURCEVERSION='6.1.135-1a~test' 
SOURCE_BASENAME='linux' SOURCE_SUFFIX='' UPSTREAMVERSION='6.1' VERSION='6.1'
make[2]: Entering directory '/home/anarcat/dist/linux-6.1.135'
mkdir -p debian/build/build-tools/headers-tools
/usr/bin/make ARCH=x86 O=debian/build/build-tools/headers-tools \
        
INSTALL_HDR_PATH=/home/anarcat/dist/linux-6.1.135/debian/build/build-tools \
        headers_install
make[3]: Entering directory '/home/anarcat/dist/linux-6.1.135'
***
*** Configuration file ".config" not found!
***
*** Please run some configurator (e.g. "make oldconfig" or
*** "make menuconfig" or "make xconfig").
***
/home/anarcat/dist/linux-6.1.135/Makefile:792: include/config/auto.conf.cmd: No 
such file or directory
make[4]: *** [/home/anarcat/dist/linux-6.1.135/Makefile:801: .config] Error 1
make[3]: *** [Makefile:250: __sub-make] Error 2
make[3]: Leaving directory '/home/anarcat/dist/linux-6.1.135'
make[2]: *** [debian/rules.real:530: debian/stamps/build-tools-headers] Error 2
make[2]: Leaving directory '/home/anarcat/dist/linux-6.1.135'
make[1]: *** [debian/rules.gen:1471: build-arch_amd64_real_kbuild] Error 2
make[1]: Leaving directory '/home/anarcat/dist/linux-6.1.135'
make: *** [debian/rules:40: build-arch] Error 2
dpkg-buildpackage: error: debian/rules binary subprocess returned exit status 2

It's been a while since I compiled linux, amazingly... It might be
because I'm trying to compile the Debian 12 kernel on Debian 13. Here
are the steps I took:

curl -o 4a05f7ae33716d996c5ce56478a36a3ede1d76f2.patch 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/patch/?id=4a05f7ae33716d996c5ce56478a36a3ede1d76f2
# (reverse the patch)
sudo apt-get build-dep linux
apt source -t bookworm-security linux
./debian/bin/test-patches -f amd64 
../4a05f7ae33716d996c5ce56478a36a3ede1d76f2.patch

a.

-- 
Life is like riding a bicycle. To keep your balance you must keep moving.
                       - Albert Einstein

Reply via email to