On Sun, Dec 7, 2025 at 5:19 PM Mark Millard <[email protected]> wrote: > > On Dec 6, 2025, at 19:03, Mark Millard <[email protected]> wrote: > > > On Dec 6, 2025, at 14:25, Warner Losh <[email protected]> wrote: > > > >> On Sat, Dec 6, 2025, 3:06 PM Mark Millard <[email protected]> wrote: > >> > >>> On Dec 6, 2025, at 06:14, Mark Millard <[email protected]> wrote: > >>> > >>>> Mateusz Guzik <mjguzik_at_gmail.com> wrote on > >>>> Date: Sat, 06 Dec 2025 10:50:08 UTC : > >>>> > >>>>> I got pointed at phoronix: > >>>>> https://www.phoronix.com/review/freebsd-15-amd-epyc > >>>>> > >>>>> While I don't treat their results as gospel, a FreeBSD vs FreeBSD test > >>>>> showing a slowdown most definitely warrants a closer look. > >>>>> > >>>>> They observed slowdowns when using iperf over localhost and when > >>>>> compiling llvm. > >>>>> > >>>>> I can confirm both problems and more. > >>>>> > >>>>> I found the profiling tooling for userspace to be broken again so I > >>>>> did not investigate much and I'm not going to dig into it further. > >>>>> > >>>>> Test box is AMD EPYC 9454 48-Core Processor, with the 2 systems > >>>>> running as 8 core vms under kvm. > >>>>> . . . > >>>> > >>>> > >>>> > >>>> Both of the below are from ampere3 (aarch64) instead, its > >>>> 2 most recent "bulk -a" runs that completed, elapsed times > >>>> shown for qt6-webengine-6.9.3 builds: > >>>> > >>>> 150releng-arm64-quarterly qt6-webengine-6.9.3 53:33:46 > >>>> 135arm64-default qt6-webengine-6.9.3 38:43:36 > >>>> > >>>> For reference: > >>>> > >>>> Host OSVERSION: 1600000 > >>>> Jail OSVERSION: 1500068 > >>>> > >>>> vs. > >>>> > >>>> Host OSVERSION: 1600000 > >>>> Jail OSVERSION: 1305000 > >>>> > >>>> The difference for the above is in the Jail's world builds, > >>>> not in the boot's (kernel+world) builds. > >>>> > >>>> > >>>> For reference: > >>>> > >>>> > >>>> https://pkg-status.freebsd.org/ampere3/build.html?mastername=150releng-arm64-quarterly&build=88084f9163ae > >>>> > >>>> build of www/qt6-webengine | qt6-webengine-6.9.3 ended at Sun Nov 30 > >>>> 05:40:02 -00 2025 > >>>> build time: 2D:05:33:52 > >>>> > >>>> > >>>> https://pkg-status.freebsd.org/ampere3/build.html?mastername=135arm64-default&build=f5384fe59be6 > >>>> > >>>> build of www/qt6-webengine | qt6-webengine-6.9.3 ended at Sat Nov 22 > >>>> 15:33:34 -00 2025 > >>>> build time: 1D:14:43:41 > >>> > >>> > >>> Expanding the notes to before and after jemalloc 5.3.0 > >>> was merged to main: beefy18 was the main-amd64 builder > >>> before and somewhat after the jemalloc 5.3.0 merge from > >>> vendor branch: > >>> > >>> Before: p2650762431ca_s51affb7e971 261:29:13 building 36074 > >>> port-packages, start 05 Aug 2025 01:10:59 GMT > >>> ( jemalloc 5.3.0 merge from vendor > >>> branch: 15 Aug 2025) > >>> After : p9652f95ce8e4_sb45a181a74c 428:49:20 building 36318 > >>> port-packages, start 19 Aug 2025 01:30:33 GMT > >>> > >>> (The log files are long gone for port-packages built.) > >>> > >>> main-15 used a debug jail world but 15.0-RELEASE does not. > >>> > >>> I'm not aware of such a port-package builder context for a > >>> non-debug jail world before and after a jemalloc 5.3.0 merge. > >>> > >> A few months before I landed the jemalloc patches, i did 4 or 5 from dirt > >> buildworlds. The elasped time was, iirc, with 1 or 2%. Enough to see maybe > >> a diff with the small sample size, but not enough for ministat to trigger > >> at 95%. I didn't recall keeping the data for this and can't find it now. > >> And I'm not even sure, in hindsight, I ran a good experiment. It might be > >> related, or not, but it would be easy enough for someone to setup a two > >> jails: one just before and one just after. Build from scratch the world > >> (same hash) on both. That would test it since you'd be holding all other > >> variables constant. > >> > >> When we imported the tip of FreeBSD main at work, we didn't get a cpu > >> change trigger from our tests that I recall... > > > > > > The range of commits look like: > > > > • git: 9a7c512a6149 - main - ucred groups: restore a useful comment Eric > > van Gyzen > > • git: bf6039f09a30 - main - jemalloc: Unthin contrib/jemalloc Warner > > Losh > > • git: a0dfba697132 - main - jemalloc: Update jemalloc.xml.in per > > FreeBSD-diffs Warner Losh > > • git: 718b13ba6c5d - main - jemalloc: Add FreeBSD's updates to > > jemalloc_preamble.h.in Warner Losh > > • git: 6371645df7b0 - main - jemalloc: Add JEMALLOC_PRIVATE_NAMESPACE > > for the libc namespace Warner Losh > > • git: da260ab23f26 - main - jemalloc: Only replace > > _pthread_mutex_init_calloc_cb in private namespace Warner Losh > > • git: c43cad871720 - main - jemalloc: Merge from jemalloc 5.3.0 vendor > > branch Warner Losh > > • git: 69af14a57c9e - main - jemalloc: Note update in UPDATING and > > RELNOTES Warner Losh > > > > I've started a build of a non-debug 9a7c512a6149 world > > to later create a chroot to do a test buildworld in. > > > > I'll also do a build of a non-debug 69af14a57c9e world > > to later create the other chroot to do a test > > buildworld in. > > > > non-debug means my use of: > > > > WITH_MALLOC_PRODUCTION= > > WITHOUT_ASSERT_DEBUG= > > WITHOUT_PTHREADS_ASSERTIONS= > > WITHOUT_LLVM_ASSERTIONS= > > > > I've used "env WITH_META_MODE=" as it cuts down on the > > volume and frequency of scrolling output. I'll do the > > same later. > > > > If there is anything you want controlled in a different > > way, let me know. > > > > The Windows Dev Kit 2023 is booted (world and kernel) > > with: > > > > # uname -apKU > > FreeBSD aarch64-main-pbase 16.0-CURRENT FreeBSD 16.0-CURRENT > > main-n281922-4872b48b175c GENERIC-NODEBUG arm64 aarch64 1600004 1600004 > > > > which is from an official pkgbase distribution. So the > > boot-world is a debug world but the boot-kernel is not. > > > > The Windows Dev Kit 2023 will take some time for such > > -j8 builds and I may end up sleeping in the middle of > > the sequence someplace. So it may be a while before > > I've any comparison/contrast data to report. > > > > > Summary for jemalloc for before vs. at 5.3.0 > for *non-debug* contexts doing the buildworld : > > before 5.3.0: 9754 seconds (about 2.7 hrs) > with 5.3.0: 9384 seconds (about 2.6 hrs) >
While in principle this can accurately reflect the difference, the benchmark itself is not valid as is. First, you can't just run it once -- the result needs to be proven repeatable and profiled. For a build of a that duration, for this few resources, for all I know the real factor was randomness from I/O. That aside you need a sanitized baseline. From the description it not clear to me at all if you are doing the build with the clang perf regression fixed or not. Even that aside, I outlined 3 more regressions: - slower binary startup to begin with - slower syscalls which fail with an error - slower syscall interface in the first place Out of the the first one is most important here. If I was to work on this, seeing that the question at hand is whether the jemalloc update is a problem, I would bypass all of the above and instead take 14.3 (not stable/14!) as a baseline + jemalloc update on top. This eliminates all of the factors other than jemalloc itself. building world also seems a little fishy here and it is not clear to me at all what version have you built -- was the new jemalloc thing building new jemalloc and old jemalloc building old jemalloc? More imporantly I would be worried some of the build picks up whatever jemalloc it finds to use during some of the build. I would benchmark this by building a big port (not timing dependencies of the port, just the port itself -- maybe even chromium or firefox). That's of course quite a bit of effort and if there is nobody to do that (or compatible), imo the pragmatic play is to revert the jemalloc update for the time being. This restores the known working state and should the update be a good thing it can land for 15.1, maybe fixed up.
