Package: qemu-user-static Version: 1:7.2+dfsg-7+deb12u7 Severity: important
Dear Maintainer, After upgrading my system from linux 6.1.99 to linux 6.1.112 I've started seeing segfaults in aarch64 containers running software builds. The following reproduces a segfault almost consistently on two of my machines --- $ docker run -ti --rm --platform linux/arm64/v8 docker.io/arm64v8/alpine:3.20 sh / # apk add bash-completion-dev dbus-dev elogind-dev gobject-introspection-dev gtk-doc libgudev-dev libmbim-dev libqmi-dev linux-headers meson vala clang abuild alpine-sdk curl / # curl -O https://gitlab.freedesktop.org/mobile-broadband/ModemManager/-/archive/1.22.0/ModemManager-1.22.0.tar.gz / # tar xf ModemManager-1.22.0.tar.gz / # cd ModemManager-1.22.0/ / # abuild-meson \ -Db_lto=true \ -Dsystemdsystemunitdir=no \ -Ddbus_policy_dir=/usr/share/dbus-1/system.d \ -Dgtk_doc=true \ -Dsystemd_journal=false \ -Dsystemd_suspend_resume=true \ -Dvapi=true \ -Dpolkit=no \ . output / # meson compile -C output --- The command that segfaults isn't always the same, just running that command in a loop sometimes reproduces but it takes a while (~5+ minutes) and it usually happens faster building the whole repo (possibly just because of parallelism) segfaults are always at the same address in qemu-aarch64-static: [719674.876060] cc1[2697164]: segfault at 4e853c0 ip 00000000006278d0 sp 00007ffdd65f4bf8 error 4 in qemu-aarch64-static[401000+46a000] likely on CPU 2 (core 2, socket 0) [720879.232743] cc1[2726993]: segfault at 50193c0 ip 00000000006278d0 sp 00007ffc9649e558 error 4 in qemu-aarch64-static[401000+46a000] likely on CPU 5 (core 5, socket 0) [722165.581114] cc[2747678]: segfault at 30343c0 ip 00000000006278d0 sp 00007ffca971f958 error 4 in qemu-aarch64-static[401000+46a000] likely on CPU 28 (core 12, socket 0) the backtrace is also always similar: (gdb) bt #0 have_mmap_lock () at ../../linux-user/mmap.c:46 #1 0x000000000061c335 in page_set_flags (start=4194304, end=39538688, flags=2248) at ../../accel/tcg/translate-all.c:1383 #2 0x0000000000628307 in target_mmap (start=start@entry=4194304, len=<optimized out>, len@entry=35342816, target_prot=target_prot@entry=0, flags=<optimized out>, fd=fd@entry=-1, offset=offset@entry=0) at ../../linux-user/mmap.c:648 #3 0x0000000000623e1d in load_elf_image (image_name=0x7ffca9720bf1 "/usr/bin/cc", image_fd=5, info=info@entry=0x7ffca971fe80, pinterp_name=pinterp_name@entry=0x7ffca971fbd0, bprm_buf=bprm_buf@entry=0x7ffca97200b0 "\177ELF\002\001\001") at ../../linux-user/elfload.c:3099 #4 0x00000000006245d9 in load_elf_binary (bprm=bprm@entry=0x7ffca97200b0, info=info@entry=0x7ffca971fe80) at ../../linux-user/elfload.c:3513 #5 0x0000000000626bab in loader_exec (fdexec=fdexec@entry=5, filename=<optimized out>, argv=argv@entry=0x30a8630, envp=envp@entry=0x304ebc0, regs=regs@entry=0x7ffca971ffa0, infop=infop@entry=0x7ffca971fe80, bprm=<optimized out>) at ../../linux-user/linuxload.c:155 #6 0x000000000040216f in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../../linux-user/main.c:880 What made the problem go away: - Shoving in the latest qemu-user (static) from sid on the machine - after inspecting kernel patches between 6.1.99 and 6.1.112 the only patch that stood out was a kaslr change[1], so I tried to disable it: `sysctl kernel.randomize_va_space=0` This also worked around the problem. Note I didn't confirm reverting the patch also fixes the issue, this is just a guess at this point. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d3dc52ff36a333c11b831809fcade780fd292c1 - rather unfortunately, checking out qemu from git, applying the patches from salsa's debian-bookworm branch and building manually as follow also made the problem go away: ``` mkdir build && cd build ../configure --without-default-features --enable-linux-user --target-list=aarch64-linux-user --static ninja qemu-aarch64 mv /usr/bin/qemu-aarch64-static /usr/bin/qemu-aarch64-static.delete cp qemu-aarch64 /usr/bin/qemu-aarch64-static systemctl restart binfmt-support.service ``` At this point I'm not quite sure what to do to debug this further, reverting the kernel commit is probably the way forward after confirming that this commit is bad but I've ran out of time for now, so reporting what I have. That rebuilding qemu fixes the problem is a bit odd too, downgrading qemu-user-static didn't help so it's not just a one-off, either something in dpkg's build flags or I don't know. Anyway, thank you! -- System Information: Debian Release: 12.8 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable-debug'), (500, 'stable') Architecture: amd64 (x86_64) Foreign Architectures: arm64, armhf, i386 Kernel: Linux 6.1.0-26-amd64 (SMP w/32 CPU threads; PREEMPT) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled qemu-user-static depends on no packages. Versions of packages qemu-user-static recommends: ii binfmt-support 2.2.2-2 ii systemd 252.31-1~deb12u1 qemu-user-static suggests no packages. -- no debconf information