Michael Tokarev wrote on Tue, Nov 19, 2024 at 01:32:33PM +0300:
> Thank you for such an accurate, detailed, smart and thoughtful bug report!
> It is a real pleasure to read such bug reports.

Glad it was appreciated ! :)

> And this is mmap.
> 
> The underlying problem here is that qemu-user don't have a MMU emulation,
> it can't arbitrary remap addresses between host and guest.  So it have to
> choose a virtual address space for the guest at startup, roughly speaking,
> a single region of it.  And if this address space happens to contain this
> process data structures on the host, things goes badly.
> 
> qemu had numerous tweaks in this area, every new release brings a new
> portion of changes, but the the very root cause of all this mess needs
> significant redesign of the whole thing which aint going to happen any
> time soon.

Thanks for the explanation, this all makes sense with the fact that
disabling kaslr works around the issue.


>> - after inspecting kernel patches between 6.1.99 and 6.1.112 the only
>>    patch that stood out was a kaslr change[1], so I tried to disable it:
>>    `sysctl kernel.randomize_va_space=0`
>>    This also worked around the problem. Note I didn't confirm reverting
>>    the patch also fixes the issue, this is just a guess at this point.
>> [1] 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d3dc52ff36a333c11b831809fcade780fd292c1

Curiosity got me along with the cat, so I went an bisected it.
That apparently was a bad guess, the culprit is this one:

https://git.kernel.org/linus/b0cde867b80a5e81fcbc0383e138f5845f2005ee

  x86: Increase brk randomness entropy for 64-bit systems

  In commit c1d171a00294 ("x86: randomize brk"), arch_randomize_brk() was
  defined to use a 32MB range (13 bits of entropy), but was never increased
  when moving to 64-bit. The default arch_randomize_brk() uses 32MB for
  32-bit tasks, and 1GB (18 bits of entropy) for 64-bit tasks.

  Update x86_64 to match the entropy used by arm64 and other 64-bit
  architectures.

I've sent a mail upstream to suggest reverting it for stable branches:
https://lkml.kernel.org/r/zz0_-ijh1war3...@codewreck.org

Either way, it's related to kaslr as we had guessed.

> > - rather unfortunately, checking out qemu from git, applying the
> >    patches from salsa's debian-bookworm branch and building manually
> >    as follow also made the problem go away:
> > ```
> > mkdir build && cd build
> > ../configure --without-default-features --enable-linux-user 
> > --target-list=aarch64-linux-user --static
> > ninja qemu-aarch64
> > mv /usr/bin/qemu-aarch64-static /usr/bin/qemu-aarch64-static.delete
> > cp qemu-aarch64 /usr/bin/qemu-aarch64-static
> > systemctl restart binfmt-support.service
> > ```
> 
> What a nice and easy trick you do here.  Yes, this is the way to go,
> it's just rare to see users are able to figure this out :)

[semi-offtopic]
It should get "easier" (as in don't require installing) with binfmt
namespaces and unshare --load-interp to get a namespace where a
different interpreter is used, but unfortunately that only landed in 6.7
so not available in bookworm yet, so I didn't get to test it all the way
[/offtopic]

> But ok.  I'll take a closer look at this one.  Because debian-bookworm
> branch is the one which is used for bookworm qemu build, and it should
> produce the same binary (give or take some unimportant details) as in
> the debian archive, but apparently it is not producing the same thing.

I've rebuilt the same tree with dpkg-buildpackage and can definitely
reproduce with that package.
Looking further into the configure arguments, I've trimmed down the
difference to --disable-pie: just adding that flag makes my hand build
reproduce this.

afaiu having pie allows the kernel to map qemu to a different place,
so I'm not sure if this really fixes the problem or if it just makes it
incredibly unlikely to reproduce.

I've started a loop to let my reproducer run overnight see if I can
reproduce with pie enabled and will report back if it does reproduce
(if you don't hear back from me then it's probably good enough of a
workaround -- unsure of why pie was disabled in the first place, the
qemu-system part of the dpkg build does leave pie on so there must have
been a reason?)

> Yes, that's what I'd love to see more closely.  Please give me a few
> days now once you've a working solutions (many of them actually), -
> I'm on a business trip now, will have a closer look when I'll return.

No hurry here, I've just been feeding my curiosity.
The two open questions I had left yesterday are now solved so I'll leave
the resolution to you on your own time.


Cheers,
-- 
Dominique

Reply via email to