On Dec 9, 2025, at 20:10, Mark Millard <[email protected]> wrote: > On Dec 9, 2025, at 17:15, Mark Millard <[email protected]> wrote: > >> On Dec 9, 2025, at 12:32, Mark Millard <[email protected]> wrote: >> >>> On Dec 9, 2025, at 07:22, Rozhuk Ivan <[email protected]> wrote: >>> >>>> On Mon, 8 Dec 2025 09:23:52 -0800 >>>> Mark Millard <[email protected]> wrote: >>>> >>>>> But, as of yet, I've no good evidence for blaming >>>>> jemalloc as a major contributor to those timing >>>>> ratios --or for blaming any other specific part >>>>> of 15.0 . >>>> >>>> If you want to bench jmalloc - there is another ways to do that without >>>> building something. >>>> Try to find some sythetic benchmarks. >>>> Also jmalloc can be build without OS rebuild and linked with bench. >>>> >>>> This 2 things can reduce time to tests, but it will eliminate OS >>>> integation factors. >>>> Run same bench on different OS may give more info. >>>> >>> >>> [I've eliminated direct Email to most everyone >>> for this reply. There is not even minor new >>> technical content.] >>> >>> At this point I'm more likely to explore if I >>> get similar ratios as ampere[13] do for some >>> port-package builds that have the large ratios on >>> ampere[13]. There are examples that are not as >>> overall time consuming for ampere[13] as what I've >>> already referenced (but are still non-trivial for >>> the time taken). As stands, I do not have a good >>> reproduce-the-issue context, much less one with >>> build time frames I'd be willing to deal with in >>> my environment. >> >> Time-ratios similar to the ampere[13] ones for >> 15.0 vs. 14.3 (or 13.5) were easily repeatable >> on the Microsoft Windows Dev Kit 2023 for doing >> poudriere builds of the examples that I tried. >> >> port-package builds tested for below: devel/cmake-core >> TMPFS_BLACKLIST empty >> ALLOW_MAKE_JOBS= in use (no explicit MAKE_JOBS_NUMBER like restrictions) >> UFS context (except for what USE_TMPFS=all does in poudriere) >> The below did not update /usr/ports/distfiles/ . >> >> This does some exploration of USE_TMPFS=no vs. >> USE_TMPFS=all as well, starting with >> USE_TMPFS=no . >> >> Listed in the sequence executed, first time >> runs shown first: >> >> >> USE_TMPFS=no . . . >> (Note: The first times had other port-packages to build first.) >> >> 15.0 poudriere jail: >> [00:37:37] [01] [00:12:30] Finished devel/cmake-core | cmake-core-3.31.9: >> Success >> >> 14.3 poudriere jail: >> [00:28:26] [01] [00:09:38] Finished devel/cmake-core | cmake-core-3.31.9: >> Success >> >> Approx. 1.30 time ratio (15.0's 12:30 / 14.3's 9:38) >> >> >> USE_TMPFS=all (no tmpfs black list) . . . >> >> 14.3 poudriere jail: >> [00:09:32] [03] [00:09:24] Finished devel/cmake-core | cmake-core-3.31.9: >> Success >> >> 15.0 poudriere jail: >> [00:12:45] [03] [00:12:34] Finished devel/cmake-core | cmake-core-3.31.9: >> Success >> >> Approx. 1.34 time ratio (15.0's/14.3's) >> >> >> The following also prefixed the poudriere bulk -C command >> with: time -l >> >> 15.0 poudriere jail: >> [00:12:36] [04] [00:12:25] Finished devel/cmake-core | cmake-core-3.31.9: >> Success >> . . . >> 757.10 real 4613.06 user 251.09 sys >> 866580 maximum resident set size >> 131 average shared memory size >> 27 average unshared data size >> 234 average unshared stack size >> 31148816 page reclaims >> 0 page faults >> 0 swaps >> 14 block input operations >> 36 block output operations >> 37061 messages sent >> 33671 messages received >> 1758 signals received >> 143987 voluntary context switches >> 167515 involuntary context switches >> >> 14.3 poudriere jail: >> [00:09:23] [01] [00:09:15] Finished devel/cmake-core | cmake-core-3.31.9: >> Success >> . . . >> 564.48 real 3449.89 user 204.14 sys >> 822900 maximum resident set size >> 64692 average shared memory size >> 791 average unshared data size >> 235 average unshared stack size >> 28153497 page reclaims >> 0 page faults >> 0 swaps >> 9 block input operations >> 12 block output operations >> 34180 messages sent >> 31539 messages received >> 1758 signals received >> 131899 voluntary context switches >> 132775 involuntary context switches >> >> Approx. 1.34 time ratio (15.0's/14.3's) >> >> >> USE_TMPFS=no . . . (again) >> >> 15.0 poudriere jail: >> [00:13:01] [04] [00:12:27] Finished devel/cmake-core | cmake-core-3.31.9: >> Success >> . . . >> 784.89 real 4596.42 user 257.12 sys >> 866600 maximum resident set size >> 128 average shared memory size >> 25 average unshared data size >> 234 average unshared stack size >> 31194466 page reclaims >> 2371 page faults >> 0 swaps >> 3573 block input operations >> 6687 block output operations >> 37643 messages sent >> 33840 messages received >> 1756 signals received >> 241548 voluntary context switches >> 304249 involuntary context switches >> >> 14.3 poudriere jail: >> [00:09:49] [04] [00:09:18] Finished devel/cmake-core | cmake-core-3.31.9: >> Success >> . . . >> 592.83 real 3446.18 user 207.61 sys >> 823880 maximum resident set size >> 64712 average shared memory size >> 787 average unshared data size >> 236 average unshared stack size >> 28176650 page reclaims >> 2374 page faults >> 0 swaps >> 3481 block input operations >> 5148 block output operations >> 34521 messages sent >> 31580 messages received >> 1758 signals received >> 218881 voluntary context switches >> 255193 involuntary context switches >> >> Approx. 1.34 time ratio (15.0's/14.3's) >> >> >> Only some port-packages have time-ratios >> near 1.34. For example, building lang/gcc15 >> does not on ampere[13]: closer to 1.1 as >> I remember. (For the most part, lang/gcc15 >> does most of its own building based on a >> smaller amount of clang-built code >> to bootstrap.) >> >> >> For reference: >> >> # poudriere jail -l >> JAILNAME VERSION OSVERSION ARCH METHOD >> TIMESTAMP PATH >> release14-aarch64 14.3-RELEASE-p6 1403000 arm64.aarch64 ftp-archive >> 2025-12-09 12:54:06 /usr/local/poudriere/jails/release14-aarch64 >> . . . >> release-aarch64 15.0-RELEASE 1500068 aarch64 pkgbase >> 2025-12-06 11:34:39 /usr/local/poudriere/jails/release-aarch64 >> . . . >> >> # ~/fbsd-based-on-what-commit.sh -C /usr/ports >> bb7b77417165 (HEAD -> main, freebsd/main, freebsd/HEAD) www/hurl: update >> 7.0.0 -> 7.1.0 >> Author: Rodrigo Osorio <[email protected]> >> Commit: Rodrigo Osorio <[email protected]> >> CommitDate: 2025-11-28 23:11:52 +0000 >> branch: main >> merge-base: bb7b774171651eea0dc56376c225fe976231daa5 >> merge-base: CommitDate: 2025-11-28 23:11:52 +0000 >> n726888 (--first-parent --count for merge-base) >> >> # uname -apKU >> FreeBSD aarch64-main-pbase 16.0-CURRENT FreeBSD 16.0-CURRENT >> main-n281922-4872b48b175c GENERIC-NODEBUG arm64 aarch64 1600004 1600004 >> >> (That last was an official pkgbase distribution.) > > 14.3-STABLE does not have jemalloc 5.3.0 or libsys > but performs like 15.0-RELEASE, not 14.3-RELEASE > for the aarch64 devel/cmake-core build tests. > But 14.3-STABLE does have: > > # ldd /usr/local/poudriere/jails/official14-aarch64/usr/bin/cc > /usr/local/poudriere/jails/official14-aarch64/usr/bin/cc: > libprivateclang.so.19 => /usr/lib/libprivateclang.so.19 (0x732e0d600000) > libprivatellvm.so.19 => /usr/lib/libprivatellvm.so.19 (0x732e12600000) > . . . > > while 14.3-RELEASE does not. > > (Another data point is that lang/gcc15 does not have > nearly as large of a time-ratio vs. 14.3-RELEASE > in the data from ampere[13] .) > > > > Details from the Microsoft Dev Kit 2023 experiments > . . . > > I've collected a sequence for a new poudriere jail > to compare/contrast with: > > # poudriere jail -l > JAILNAME VERSION OSVERSION ARCH METHOD > TIMESTAMP PATH > . . . > official14-aarch64 14.3-STABLE 1403506 arm64.aarch64 freebsdci > 2025-12-09 18:24:20 /usr/local/poudriere/jails/official14-aarch64 > . . . > > (ampere[13] do not have examples of recent 14.3-STABLE builds at this point.) > > > USE_TMPFS=no . . . > (Note: The first times had other port-packages to build first. > But the system still has the cached the file system data.) > > stable/14 poudriere jail: > [00:36:29] [01] [00:12:31] Finished devel/cmake-core | cmake-core-3.31.9: > Success > > So: 12:31 is far more like 15.0-RELEASE > > > USE_TMPFS=all . . . > > stable/14 poudriere jail: > [00:12:21] [07] [00:12:10] Finished devel/cmake-core | cmake-core-3.31.9: > Success > . . . > 742.70 real 4586.53 user 248.37 sys > 864996 maximum resident set size > 133 average shared memory size > 24 average unshared data size > 235 average unshared stack size > 30958626 page reclaims > 0 page faults > 0 swaps > 456 block input operations > 80 block output operations > 35920 messages sent > 33223 messages received > 1760 signals received > 140580 voluntary context switches > 164112 involuntary context switches > > So: 12:10 is far more like 15.0-RELEASE > > > stable/14 poudriere jail (again): > [00:12:30] [08] [00:12:19] Finished devel/cmake-core | cmake-core-3.31.9: > Success > . . . > 751.98 real 4604.85 user 251.40 sys > 866056 maximum resident set size > 125 average shared memory size > 21 average unshared data size > 235 average unshared stack size > 30976603 page reclaims > 0 page faults > 0 swaps > 20 block input operations > 11 block output operations > 36297 messages sent > 33327 messages received > 1761 signals received > 144213 voluntary context switches > 166975 involuntary context switches > > So: 12:19 is far more like 15.0-RELEASE > > > USE_TMPFS=no . . . > (Note: The first times had other port-packages to build first.) > > stable/14 poudriere jail: > [00:13:16] [05] [00:12:49] Finished devel/cmake-core | cmake-core-3.31.9: > Success > . . . > 799.95 real 4626.06 user 261.49 sys > 865940 maximum resident set size > 134 average shared memory size > 24 average unshared data size > 235 average unshared stack size > 31110419 page reclaims > 2380 page faults > 0 swaps > 3577 block input operations > 6262 block output operations > 37253 messages sent > 33801 messages received > 1758 signals received > 236161 voluntary context switches > 312615 involuntary context switches > > > So: 12:49 is far more like 15.0-RELEASE > > > (Nice to have a known repeatable context to try > variations with.) >
On the 7950X3D system that I have access to, I created: # poudriere jail -l JAILNAME VERSION OSVERSION ARCH METHOD TIMESTAMP PATH release14-amd64 14.3-RELEASE-p6 1403000 amd64 ftp-archive 2025-12-10 12:55:10 /usr/local/poudriere/jails/release14-amd64 official14-amd64 14.3-STABLE 1403506 amd64 freebsdci 2025-12-10 12:55:17 /usr/local/poudriere/jails/official14-amd64 . . . I then did builds of qt6-webengine-6.9.3 for each (all 32 FreeBSD CPUs allowed, with 32 builders allowed when building prerequisites): amd64 7950XCD 14.3-RELEASE poudriere jail: [01:35:37] [01] [00:32:14] Finished www/qt6-webengine | qt6-webengine-6.9.3: Success amd64 7950XCD 14.3-STABLE poudriere jail: [01:56:15] [01] [00:40:46] Finished www/qt6-webengine | qt6-webengine-6.9.3: Success 40:46 / 32:14 approx.= 1.26 as a suggestive figure for the specific test context, an actual example. But I've no way to test across the variety of FreeBSD official builder systems and do not see a point in exploring the variability just for my specific type of amd64 context. 1.26 is smaller than on the ampere*'s or the Microsoft Dev Kit 2023 context (aarch64) got but is still notable for its size. It suggests that the issue is not aarch64 specific overall. I'll note that I've only explored the one type of example performance regression: port-package build time ratios that are notable. There could be other regressions that are unrelated or are minor for the examples that I've looked at but are important for other contexts. It looks to me like the tradeoff between builder-time and memory use by clang/clang++/related needs an explicit choice about the handling going forwards. If only the ampere*'s port-package building was being considered, it looks like "take less time" would seem the likely judgment: there are lots of time problems already for the 3 aarch64 builder machines. === Mark Millard marklmi at yahoo.com
