Package: qemu-user-static
Version: 1:7.2+dfsg-7+deb12u7
Severity: important

Dear Maintainer,

After upgrading my system from linux 6.1.99 to linux 6.1.112 I've
started seeing segfaults in aarch64 containers running software builds.

The following reproduces a segfault almost consistently on two of my
machines
---
$ docker run -ti --rm --platform linux/arm64/v8 docker.io/arm64v8/alpine:3.20 sh
/ # apk add bash-completion-dev dbus-dev elogind-dev gobject-introspection-dev 
gtk-doc libgudev-dev libmbim-dev libqmi-dev linux-headers meson vala clang 
abuild alpine-sdk curl
/ # curl -O 
https://gitlab.freedesktop.org/mobile-broadband/ModemManager/-/archive/1.22.0/ModemManager-1.22.0.tar.gz
/ # tar xf ModemManager-1.22.0.tar.gz
/ # cd ModemManager-1.22.0/
/ # abuild-meson \
        -Db_lto=true \
        -Dsystemdsystemunitdir=no \
        -Ddbus_policy_dir=/usr/share/dbus-1/system.d \
        -Dgtk_doc=true \
        -Dsystemd_journal=false \
        -Dsystemd_suspend_resume=true \
        -Dvapi=true \
        -Dpolkit=no \
        . output
/ # meson compile -C output
---

The command that segfaults isn't always the same, just running that
command in a loop sometimes reproduces but it takes a while (~5+
minutes) and it usually happens faster building the whole repo (possibly
just because of parallelism)

segfaults are always at the same address in qemu-aarch64-static:
[719674.876060] cc1[2697164]: segfault at 4e853c0 ip 00000000006278d0 sp 
00007ffdd65f4bf8 error 4 in qemu-aarch64-static[401000+46a000] likely on CPU 2 
(core 2, socket 0)
[720879.232743] cc1[2726993]: segfault at 50193c0 ip 00000000006278d0 sp 
00007ffc9649e558 error 4 in qemu-aarch64-static[401000+46a000] likely on CPU 5 
(core 5, socket 0)
[722165.581114] cc[2747678]: segfault at 30343c0 ip 00000000006278d0 sp 
00007ffca971f958 error 4 in qemu-aarch64-static[401000+46a000] likely on CPU 28 
(core 12, socket 0)

the backtrace is also always similar:
(gdb) bt
#0  have_mmap_lock () at ../../linux-user/mmap.c:46
#1  0x000000000061c335 in page_set_flags (start=4194304, end=39538688, 
flags=2248) at ../../accel/tcg/translate-all.c:1383
#2  0x0000000000628307 in target_mmap (start=start@entry=4194304, 
len=<optimized out>, len@entry=35342816, target_prot=target_prot@entry=0, 
flags=<optimized out>,
    fd=fd@entry=-1, offset=offset@entry=0) at ../../linux-user/mmap.c:648
#3  0x0000000000623e1d in load_elf_image (image_name=0x7ffca9720bf1 
"/usr/bin/cc", image_fd=5, info=info@entry=0x7ffca971fe80,
    pinterp_name=pinterp_name@entry=0x7ffca971fbd0, 
bprm_buf=bprm_buf@entry=0x7ffca97200b0 "\177ELF\002\001\001") at 
../../linux-user/elfload.c:3099
#4  0x00000000006245d9 in load_elf_binary (bprm=bprm@entry=0x7ffca97200b0, 
info=info@entry=0x7ffca971fe80) at ../../linux-user/elfload.c:3513
#5  0x0000000000626bab in loader_exec (fdexec=fdexec@entry=5, 
filename=<optimized out>, argv=argv@entry=0x30a8630, envp=envp@entry=0x304ebc0,
    regs=regs@entry=0x7ffca971ffa0, infop=infop@entry=0x7ffca971fe80, 
bprm=<optimized out>) at ../../linux-user/linuxload.c:155
#6  0x000000000040216f in main (argc=<optimized out>, argv=<optimized out>, 
envp=<optimized out>) at ../../linux-user/main.c:880


What made the problem go away:
- Shoving in the latest qemu-user (static) from sid on the machine
- after inspecting kernel patches between 6.1.99 and 6.1.112 the only
  patch that stood out was a kaslr change[1], so I tried to disable it:
  `sysctl kernel.randomize_va_space=0`
  This also worked around the problem. Note I didn't confirm reverting
  the patch also fixes the issue, this is just a guess at this point.
[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d3dc52ff36a333c11b831809fcade780fd292c1
 
- rather unfortunately, checking out qemu from git, applying the
  patches from salsa's debian-bookworm branch and building manually
  as follow also made the problem go away:
```
mkdir build && cd build
../configure --without-default-features --enable-linux-user 
--target-list=aarch64-linux-user --static
ninja qemu-aarch64
mv /usr/bin/qemu-aarch64-static /usr/bin/qemu-aarch64-static.delete
cp qemu-aarch64 /usr/bin/qemu-aarch64-static
systemctl restart binfmt-support.service
```

At this point I'm not quite sure what to do to debug this further,
reverting the kernel commit is probably the way forward after confirming
that this commit is bad but I've ran out of time for now, so reporting
what I have. That rebuilding qemu fixes the problem is a bit odd too,
downgrading qemu-user-static didn't help so it's not just a one-off,
either something in dpkg's build flags or I don't know.

Anyway, thank you!

-- System Information:
Debian Release: 12.8
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 
'stable-debug'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: arm64, armhf, i386

Kernel: Linux 6.1.0-26-amd64 (SMP w/32 CPU threads; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

qemu-user-static depends on no packages.

Versions of packages qemu-user-static recommends:
ii  binfmt-support  2.2.2-2
ii  systemd         252.31-1~deb12u1

qemu-user-static suggests no packages.

-- no debconf information

Reply via email to