On Dec 9, 2025, at 20:10, Mark Millard <[email protected]> wrote:

> On Dec 9, 2025, at 17:15, Mark Millard <[email protected]> wrote:
> 
>> On Dec 9, 2025, at 12:32, Mark Millard <[email protected]> wrote:
>> 
>>> On Dec 9, 2025, at 07:22, Rozhuk Ivan <[email protected]> wrote:
>>> 
>>>> On Mon, 8 Dec 2025 09:23:52 -0800
>>>> Mark Millard <[email protected]> wrote:
>>>> 
>>>>> But, as of yet, I've no good evidence for blaming
>>>>> jemalloc as a major contributor to those timing
>>>>> ratios --or for blaming any other specific part
>>>>> of 15.0 .
>>>> 
>>>> If you want to bench jmalloc - there is another ways to do that without 
>>>> building something.
>>>> Try to find some sythetic benchmarks.
>>>> Also jmalloc can be build without OS rebuild and linked with bench.
>>>> 
>>>> This 2 things can reduce time to tests, but it will eliminate OS 
>>>> integation factors.
>>>> Run same bench on different OS may give more info.
>>>> 
>>> 
>>> [I've eliminated direct Email to most everyone
>>> for this reply. There is not even minor new
>>> technical content.]
>>> 
>>> At this point I'm more likely to explore if I
>>> get similar ratios as ampere[13] do for some
>>> port-package builds that have the large ratios on
>>> ampere[13]. There are examples that are not as
>>> overall time consuming for ampere[13] as what I've
>>> already referenced (but are still non-trivial for
>>> the time taken). As stands, I do not have a good
>>> reproduce-the-issue context, much less one with
>>> build time frames I'd be willing to deal with in
>>> my environment.
>> 
>> Time-ratios similar to the ampere[13] ones for
>> 15.0 vs. 14.3 (or 13.5) were easily repeatable
>> on the Microsoft Windows Dev Kit 2023 for doing
>> poudriere builds of the examples that I tried.
>> 
>> port-package builds tested for below: devel/cmake-core
>> TMPFS_BLACKLIST empty
>> ALLOW_MAKE_JOBS= in use (no explicit MAKE_JOBS_NUMBER like restrictions)
>> UFS context (except for what USE_TMPFS=all does in poudriere)
>> The below did not update /usr/ports/distfiles/ .
>> 
>> This does some exploration of USE_TMPFS=no vs.
>> USE_TMPFS=all as well, starting with
>> USE_TMPFS=no .
>> 
>> Listed in the sequence executed, first time
>> runs shown first:
>> 
>> 
>> USE_TMPFS=no . . .
>> (Note: The first times had other port-packages to build first.)
>> 
>> 15.0 poudriere jail:
>> [00:37:37] [01] [00:12:30] Finished   devel/cmake-core | cmake-core-3.31.9: 
>> Success
>> 
>> 14.3 poudriere jail:
>> [00:28:26] [01] [00:09:38] Finished   devel/cmake-core | cmake-core-3.31.9: 
>> Success
>> 
>> Approx. 1.30 time ratio (15.0's 12:30 / 14.3's 9:38)
>> 
>> 
>> USE_TMPFS=all (no tmpfs black list) . . .
>> 
>> 14.3 poudriere jail:
>> [00:09:32] [03] [00:09:24] Finished   devel/cmake-core | cmake-core-3.31.9: 
>> Success
>> 
>> 15.0 poudriere jail:
>> [00:12:45] [03] [00:12:34] Finished   devel/cmake-core | cmake-core-3.31.9: 
>> Success
>> 
>> Approx. 1.34 time ratio (15.0's/14.3's)
>> 
>> 
>> The following also prefixed the poudriere bulk -C command
>> with: time -l
>> 
>> 15.0 poudriere jail:
>> [00:12:36] [04] [00:12:25] Finished   devel/cmake-core | cmake-core-3.31.9: 
>> Success
>> . . .
>>     757.10 real      4613.06 user       251.09 sys
>>   866580  maximum resident set size
>>      131  average shared memory size
>>       27  average unshared data size
>>      234  average unshared stack size
>> 31148816  page reclaims
>>        0  page faults
>>        0  swaps
>>       14  block input operations
>>       36  block output operations
>>    37061  messages sent
>>    33671  messages received
>>     1758  signals received
>>   143987  voluntary context switches
>>   167515  involuntary context switches
>> 
>> 14.3 poudriere jail:
>> [00:09:23] [01] [00:09:15] Finished   devel/cmake-core | cmake-core-3.31.9: 
>> Success
>> . . .
>>     564.48 real      3449.89 user       204.14 sys
>>   822900  maximum resident set size
>>    64692  average shared memory size
>>      791  average unshared data size
>>      235  average unshared stack size
>> 28153497  page reclaims
>>        0  page faults
>>        0  swaps
>>        9  block input operations
>>       12  block output operations
>>    34180  messages sent
>>    31539  messages received
>>     1758  signals received
>>   131899  voluntary context switches
>>   132775  involuntary context switches
>> 
>> Approx. 1.34 time ratio (15.0's/14.3's)
>> 
>> 
>> USE_TMPFS=no . . . (again)
>> 
>> 15.0 poudriere jail:
>> [00:13:01] [04] [00:12:27] Finished   devel/cmake-core | cmake-core-3.31.9: 
>> Success
>> . . .
>>     784.89 real      4596.42 user       257.12 sys
>>   866600  maximum resident set size
>>      128  average shared memory size
>>       25  average unshared data size
>>      234  average unshared stack size
>> 31194466  page reclaims
>>     2371  page faults
>>        0  swaps
>>     3573  block input operations
>>     6687  block output operations
>>    37643  messages sent
>>    33840  messages received
>>     1756  signals received
>>   241548  voluntary context switches
>>   304249  involuntary context switches
>> 
>> 14.3 poudriere jail:
>> [00:09:49] [04] [00:09:18] Finished   devel/cmake-core | cmake-core-3.31.9: 
>> Success
>> . . .
>>     592.83 real      3446.18 user       207.61 sys
>>   823880  maximum resident set size
>>    64712  average shared memory size
>>      787  average unshared data size
>>      236  average unshared stack size
>> 28176650  page reclaims
>>     2374  page faults
>>        0  swaps
>>     3481  block input operations
>>     5148  block output operations
>>    34521  messages sent
>>    31580  messages received
>>     1758  signals received
>>   218881  voluntary context switches
>>   255193  involuntary context switches
>> 
>> Approx. 1.34 time ratio (15.0's/14.3's)
>> 
>> 
>> Only some port-packages have time-ratios
>> near 1.34. For example, building lang/gcc15
>> does not on ampere[13]: closer to 1.1 as
>> I remember. (For the most part, lang/gcc15
>> does most of its own building based on a
>> smaller amount of clang-built code
>> to bootstrap.)
>> 
>> 
>> For reference:
>> 
>> # poudriere jail -l
>> JAILNAME          VERSION         OSVERSION ARCH          METHOD      
>> TIMESTAMP           PATH
>> release14-aarch64 14.3-RELEASE-p6 1403000   arm64.aarch64 ftp-archive 
>> 2025-12-09 12:54:06 /usr/local/poudriere/jails/release14-aarch64
>> . . .
>> release-aarch64   15.0-RELEASE    1500068   aarch64       pkgbase     
>> 2025-12-06 11:34:39 /usr/local/poudriere/jails/release-aarch64
>> . . .
>> 
>> # ~/fbsd-based-on-what-commit.sh -C /usr/ports
>> bb7b77417165 (HEAD -> main, freebsd/main, freebsd/HEAD) www/hurl: update 
>> 7.0.0 -> 7.1.0
>> Author:     Rodrigo Osorio <[email protected]>
>> Commit:     Rodrigo Osorio <[email protected]>
>> CommitDate: 2025-11-28 23:11:52 +0000
>> branch: main
>> merge-base: bb7b774171651eea0dc56376c225fe976231daa5
>> merge-base: CommitDate: 2025-11-28 23:11:52 +0000
>> n726888 (--first-parent --count for merge-base)
>> 
>> # uname -apKU
>> FreeBSD aarch64-main-pbase 16.0-CURRENT FreeBSD 16.0-CURRENT 
>> main-n281922-4872b48b175c GENERIC-NODEBUG arm64 aarch64 1600004 1600004
>> 
>> (That last was an official pkgbase distribution.)
> 
> 14.3-STABLE does not have jemalloc 5.3.0 or libsys
> but performs like 15.0-RELEASE, not 14.3-RELEASE
> for the aarch64 devel/cmake-core build tests.
> But 14.3-STABLE does have:
> 
> # ldd /usr/local/poudriere/jails/official14-aarch64/usr/bin/cc
> /usr/local/poudriere/jails/official14-aarch64/usr/bin/cc:
> libprivateclang.so.19 => /usr/lib/libprivateclang.so.19 (0x732e0d600000)
> libprivatellvm.so.19 => /usr/lib/libprivatellvm.so.19 (0x732e12600000)
> . . .
> 
> while 14.3-RELEASE does not.
> 
> (Another data point is that lang/gcc15 does not have
> nearly as large of a time-ratio vs. 14.3-RELEASE
> in the data from ampere[13] .)
> 
> 
> 
> Details from the Microsoft Dev Kit 2023 experiments
> . . .
> 
> I've collected a sequence for a new poudriere jail
> to compare/contrast with:
> 
> # poudriere jail -l
> JAILNAME           VERSION         OSVERSION ARCH          METHOD      
> TIMESTAMP           PATH
> . . .
> official14-aarch64 14.3-STABLE     1403506   arm64.aarch64 freebsdci   
> 2025-12-09 18:24:20 /usr/local/poudriere/jails/official14-aarch64
> . . .
> 
> (ampere[13] do not have examples of recent 14.3-STABLE builds at this point.)
> 
> 
> USE_TMPFS=no . . .
> (Note: The first times had other port-packages to build first.
> But the system still has the cached the file system data.)
> 
> stable/14 poudriere jail:
> [00:36:29] [01] [00:12:31] Finished   devel/cmake-core | cmake-core-3.31.9: 
> Success
> 
> So: 12:31 is far more like 15.0-RELEASE
> 
> 
> USE_TMPFS=all . . .
> 
> stable/14 poudriere jail:
> [00:12:21] [07] [00:12:10] Finished   devel/cmake-core | cmake-core-3.31.9: 
> Success
> . . .
>      742.70 real      4586.53 user       248.37 sys
>    864996  maximum resident set size
>       133  average shared memory size
>        24  average unshared data size
>       235  average unshared stack size
>  30958626  page reclaims
>         0  page faults
>         0  swaps
>       456  block input operations
>        80  block output operations
>     35920  messages sent
>     33223  messages received
>      1760  signals received
>    140580  voluntary context switches
>    164112  involuntary context switches
> 
> So: 12:10 is far more like 15.0-RELEASE
> 
> 
> stable/14 poudriere jail (again):
> [00:12:30] [08] [00:12:19] Finished   devel/cmake-core | cmake-core-3.31.9: 
> Success
> . . .
>      751.98 real      4604.85 user       251.40 sys
>    866056  maximum resident set size
>       125  average shared memory size
>        21  average unshared data size
>       235  average unshared stack size
>  30976603  page reclaims
>         0  page faults
>         0  swaps
>        20  block input operations
>        11  block output operations
>     36297  messages sent
>     33327  messages received
>      1761  signals received
>    144213  voluntary context switches
>    166975  involuntary context switches
> 
> So: 12:19 is far more like 15.0-RELEASE
> 
> 
> USE_TMPFS=no . . .
> (Note: The first times had other port-packages to build first.)
> 
> stable/14 poudriere jail:
> [00:13:16] [05] [00:12:49] Finished   devel/cmake-core | cmake-core-3.31.9: 
> Success
> . . .
>      799.95 real      4626.06 user       261.49 sys
>    865940  maximum resident set size
>       134  average shared memory size
>        24  average unshared data size
>       235  average unshared stack size
>  31110419  page reclaims
>      2380  page faults
>         0  swaps
>      3577  block input operations
>      6262  block output operations
>     37253  messages sent
>     33801  messages received
>      1758  signals received
>    236161  voluntary context switches
>    312615  involuntary context switches
> 
> 
> So: 12:49 is far more like 15.0-RELEASE
> 
> 
> (Nice to have a known repeatable context to try
> variations with.)
> 


On the 7950X3D system that I have access to, I created:

# poudriere jail -l
JAILNAME                   VERSION         OSVERSION ARCH  METHOD      
TIMESTAMP           PATH
release14-amd64            14.3-RELEASE-p6 1403000   amd64 ftp-archive 
2025-12-10 12:55:10 /usr/local/poudriere/jails/release14-amd64
official14-amd64           14.3-STABLE     1403506   amd64 freebsdci   
2025-12-10 12:55:17 /usr/local/poudriere/jails/official14-amd64
. . .

I then did builds of qt6-webengine-6.9.3 for each
(all 32 FreeBSD CPUs allowed, with 32 builders
allowed when building prerequisites):

amd64 7950XCD 14.3-RELEASE poudriere jail:
[01:35:37] [01] [00:32:14] Finished   www/qt6-webengine | qt6-webengine-6.9.3: 
Success

amd64 7950XCD 14.3-STABLE poudriere jail:
[01:56:15] [01] [00:40:46] Finished   www/qt6-webengine | qt6-webengine-6.9.3: 
Success

40:46 / 32:14 approx.= 1.26 as a suggestive figure for
the specific test context, an actual example.

But I've no way to test across the variety of FreeBSD
official builder systems and do not see a point in
exploring the variability just for my specific
type of amd64 context.

1.26 is smaller than on the ampere*'s or the Microsoft
Dev Kit 2023 context (aarch64) got but is still
notable for its size. It suggests that the issue is
not aarch64 specific overall.


I'll note that I've only explored the one type of
example performance regression: port-package build
time ratios that are notable. There could be other
regressions that are unrelated or are minor for the
examples that I've looked at but are important for
other contexts.

It looks to me like the tradeoff between builder-time
and memory use by clang/clang++/related needs an
explicit choice about the handling going forwards.
If only the ampere*'s port-package building was
being considered, it looks like "take less time"
would seem the likely judgment: there are lots of
time problems already for the 3 aarch64 builder
machines.


===
Mark Millard
marklmi at yahoo.com


Reply via email to