On Dec 8, 2025, at 04:46, Mateusz Guzik <[email protected]> wrote:

> On Sun, Dec 7, 2025 at 5:19 PM Mark Millard <[email protected]> wrote:
>> 
>> On Dec 6, 2025, at 19:03, Mark Millard <[email protected]> wrote:
>> 
>>> On Dec 6, 2025, at 14:25, Warner Losh <[email protected]> wrote:
>>> 
>>>> On Sat, Dec 6, 2025, 3:06 PM Mark Millard <[email protected]> wrote:
>>>> 
>>>>> On Dec 6, 2025, at 06:14, Mark Millard <[email protected]> wrote:
>>>>> 
>>>>>> Mateusz Guzik <mjguzik_at_gmail.com> wrote on
>>>>>> Date: Sat, 06 Dec 2025 10:50:08 UTC :
>>>>>> 
>>>>>>> I got pointed at phoronix: 
>>>>>>> https://www.phoronix.com/review/freebsd-15-amd-epyc
>>>>>>> 
>>>>>>> While I don't treat their results as gospel, a FreeBSD vs FreeBSD test
>>>>>>> showing a slowdown most definitely warrants a closer look.
>>>>>>> 
>>>>>>> They observed slowdowns when using iperf over localhost and when 
>>>>>>> compiling llvm.
>>>>>>> 
>>>>>>> I can confirm both problems and more.
>>>>>>> 
>>>>>>> I found the profiling tooling for userspace to be broken again so I
>>>>>>> did not investigate much and I'm not going to dig into it further.
>>>>>>> 
>>>>>>> Test box is AMD EPYC 9454 48-Core Processor, with the 2 systems
>>>>>>> running as 8 core vms under kvm.
>>>>>>> . . .
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Both of the below are from ampere3 (aarch64) instead, its
>>>>>> 2 most recent "bulk -a" runs that completed, elapsed times
>>>>>> shown for qt6-webengine-6.9.3 builds:
>>>>>> 
>>>>>> 150releng-arm64-quarterly qt6-webengine-6.9.3 53:33:46
>>>>>> 135arm64-default          qt6-webengine-6.9.3 38:43:36

A somewhat better comparison is now available from the
active builds, here quarterly 14.3 to match with the
quarterly 15.0 . . .

https://pkg-status.freebsd.org/ampere1/data/143arm64-quarterly/1081574d367d/logs/qt6-webengine-6.9.3.log

shows 14.3 quarterly getting the qt6-webengine-6.9.3
build timing: 38:25:51

on ampere1 with:

Host OSVERSION: 1600004
Jail OSVERSION: 1403000

15.0 is definitely the large one.

As far as I know ampere1 and ampere3 match for there hardware configurations.
(Not that such information is public so I do not have great evidence.)

Given the similarity to 135arm64-default, I will generally
not switch to referencing 14.3's timing below, leaving that
implicit.

>>>>>> For reference:
>>>>>> 
>>>>>> Host OSVERSION: 1600000
>>>>>> Jail OSVERSION: 1500068
>>>>>> 
>>>>>> vs.
>>>>>> 
>>>>>> Host OSVERSION: 1600000
>>>>>> Jail OSVERSION: 1305000
>>>>>> 
>>>>>> The difference for the above is in the Jail's world builds,
>>>>>> not in the boot's (kernel+world) builds.
>>>>>> 
>>>>>> 
>>>>>> For reference:
>>>>>> 
>>>>>> 
>>>>>> https://pkg-status.freebsd.org/ampere3/build.html?mastername=150releng-arm64-quarterly&build=88084f9163ae
>>>>>> 
>>>>>> build of www/qt6-webengine | qt6-webengine-6.9.3 ended at Sun Nov 30 
>>>>>> 05:40:02 -00 2025
>>>>>> build time: 2D:05:33:52
>>>>>> 
>>>>>> 
>>>>>> https://pkg-status.freebsd.org/ampere3/build.html?mastername=135arm64-default&build=f5384fe59be6
>>>>>> 
>>>>>> build of www/qt6-webengine | qt6-webengine-6.9.3 ended at Sat Nov 22 
>>>>>> 15:33:34 -00 2025
>>>>>> build time: 1D:14:43:41
>>>>> 
>>>>> 
>>>>> Expanding the notes to before and after jemalloc 5.3.0
>>>>> was merged to main: beefy18 was the main-amd64 builder
>>>>> before and somewhat after the jemalloc 5.3.0 merge from
>>>>> vendor branch:
>>>>> 
>>>>> Before: p2650762431ca_s51affb7e971 261:29:13 building 36074 
>>>>> port-packages, start 05 Aug 2025 01:10:59 GMT
>>>>> (                                       jemalloc 5.3.0 merge from vendor 
>>>>> branch: 15 Aug 2025)
>>>>> After : p9652f95ce8e4_sb45a181a74c 428:49:20 building 36318 
>>>>> port-packages, start 19 Aug 2025 01:30:33 GMT
>>>>> 
>>>>> (The log files are long gone for port-packages built.)
>>>>> 
>>>>> main-15 used a debug jail world but 15.0-RELEASE does not.
>>>>> 
>>>>> I'm not aware of such a port-package builder context for a
>>>>> non-debug jail world before and after a jemalloc 5.3.0 merge.
>>>>> 
>>>> A few months before I landed the jemalloc patches, i did 4 or 5 from dirt 
>>>> buildworlds. The elasped time was, iirc, with 1 or 2%. Enough to see maybe 
>>>> a diff with the small sample size, but not enough for ministat to trigger 
>>>> at 95%. I didn't recall keeping the data for this and can't find it now. 
>>>> And I'm not even sure, in hindsight, I ran a good experiment. It might be 
>>>> related, or not, but it would be easy enough for someone to setup a two 
>>>> jails: one just before and one just after. Build from scratch the world 
>>>> (same hash) on both. That would test it since you'd be holding all other 
>>>> variables constant.
>>>> 
>>>> When we imported the tip of FreeBSD main at work, we didn't get a cpu 
>>>> change trigger from our tests that I recall...
>>> 
>>> 
>>> The range of commits look like:
>>> 
>>>   • git: 9a7c512a6149 - main - ucred groups: restore a useful comment Eric 
>>> van Gyzen
>>>   • git: bf6039f09a30 - main - jemalloc: Unthin contrib/jemalloc Warner Losh
>>>   • git: a0dfba697132 - main - jemalloc: Update jemalloc.xml.in per 
>>> FreeBSD-diffs Warner Losh
>>>   • git: 718b13ba6c5d - main - jemalloc: Add FreeBSD's updates to 
>>> jemalloc_preamble.h.in Warner Losh
>>>   • git: 6371645df7b0 - main - jemalloc: Add JEMALLOC_PRIVATE_NAMESPACE for 
>>> the libc namespace Warner Losh
>>>   • git: da260ab23f26 - main - jemalloc: Only replace 
>>> _pthread_mutex_init_calloc_cb in private namespace Warner Losh
>>>   • git: c43cad871720 - main - jemalloc: Merge from jemalloc 5.3.0 vendor 
>>> branch Warner Losh
>>>   • git: 69af14a57c9e - main - jemalloc: Note update in UPDATING and 
>>> RELNOTES Warner Losh
>>> 
>>> I've started a build of a non-debug 9a7c512a6149 world
>>> to later create a chroot to do a test buildworld in.
>>> 
>>> I'll also do a build of a non-debug 69af14a57c9e world
>>> to later create the other chroot to do a test
>>> buildworld in.
>>> 
>>> non-debug means my use of:
>>> 
>>> WITH_MALLOC_PRODUCTION=
>>> WITHOUT_ASSERT_DEBUG=
>>> WITHOUT_PTHREADS_ASSERTIONS=
>>> WITHOUT_LLVM_ASSERTIONS=
>>> 
>>> I've used "env WITH_META_MODE=" as it cuts down on the
>>> volume and frequency of scrolling output. I'll do the
>>> same later.
>>> 
>>> If there is anything you want controlled in a different
>>> way, let me know.
>>> 
>>> The Windows Dev Kit 2023 is booted (world and kernel)
>>> with:
>>> 
>>> # uname -apKU
>>> FreeBSD aarch64-main-pbase 16.0-CURRENT FreeBSD 16.0-CURRENT 
>>> main-n281922-4872b48b175c GENERIC-NODEBUG arm64 aarch64 1600004 1600004
>>> 
>>> which is from an official pkgbase distribution. So the
>>> boot-world is a debug world but the boot-kernel is not.
>>> 
>>> The Windows Dev Kit 2023 will take some time for such
>>> -j8 builds and I may end up sleeping in the middle of
>>> the sequence someplace. So it may be a while before
>>> I've any comparison/contrast data to report.
>>> 
>> 
>> 
>> Summary for jemalloc for before vs. at 5.3.0
>> for *non-debug* contexts doing the buildworld :
>> 
>> before 5.3.0: 9754 seconds (about 2.7 hrs)
>> with   5.3.0: 9384 seconds (about 2.6 hrs)
>> 
> 
> While in principle this can accurately reflect the difference, the
> benchmark itself is not valid as is.

I remind of what started this for my specific
messages:

On ampere3 :
150releng-arm64-quarterly qt6-webengine-6.9.3 53:33:46
135arm64-default          qt6-webengine-6.9.3 38:43:36

A fairly large scale multiplication factor. The test
was a cross check on that, at least that is how I
interpreted Warner's request and was my purpose in
agreeing to do the test.

I tried to do what Warner asked. It adds a little data
to what he reported.

I do not view the result as indicating much more than
the two builds are approximately equal for the time
taken. I have no reason to care if the timings swapped,
for example: same conclusion for the comparison I was
making.

It would be highly unlikely repeated tests to have
variability reach anywhere near the qt6-webengine-6.9.3
scale factor difference.

> First, you can't just run it once -- the result needs to be proven
> repeatable and profiled.For a build of a that duration, for this few
> resources, 

For comparison to:

150releng-arm64-quarterly qt6-webengine-6.9.3 53:33:46
135arm64-default          qt6-webengine-6.9.3 38:43:36

and that size of scale factor, I'd say, yes I can,
given the near equality that I got. It is eveidence
that the type of test has missed being relevant,
other than showing no such systematic scale factor
for the type of test.


FYI: 32 GiBytes of RAM. 8 cores that are compatible
with Cortex-A76 targeting, 4 are X1C and 4 are A78C.
USB3 in use, with a U.2 1.4 TB Optane as media, via
an adapter. UFS file system.

> for all I know the real factor was randomness from I/O.

Not for a change of scale to instead be
similar to: 53:33:46 vs. 38:43:36 for
building qt6-webengine-6.9.3 as far as
I can see.

> That aside you need a sanitized baseline. From the description it not
> clear to me at all if you are doing the build with the clang perf
> regression fixed or not.

My result indicate, in part, that it is not a
good way to investigate the 53:33:46 vs.
38:43:36 for building qt6-webengine-6.9.3 .
I doubt I need a better baseline for that
judgment now. I'd need a different type of
test activity.

> Even that aside, I outlined 3 more regressions:
> - slower binary startup to begin with
> - slower syscalls which fail with an error
> - slower syscall interface in the first place
> 
> Out of the the first one is most important here.

Do you expect any combination of those to be a
significant part of the scale factor difference
for 53:33:46 vs. 38:43:36 for building
qt6-webengine-6.9.3 ?

> If I was to work on this,

I would not claim that we are targeting the same
issue, even with Warner's request considered that
added what he was targetting.

> seeing that the question at hand is whether
> the jemalloc update is a problem,

I think the specifics of the qt6-webengine-6.9.3
building would need to be the investigative
context for what was "at hand" for me. In part
that judgement is based on the test I did finding
near equality for jemalloc .

> I would bypass all of the above and
> instead take 14.3 (not stable/14!) as a baseline + jemalloc update on
> top. This eliminates all of the factors other than jemalloc itself.

I'll note that ampere1 with a 14.3 jail took 38:25:51
for its build of qt6-webengine-6.9.3 . That scale of
timing is not specific to 13.5 jail worlds.

> building world also seems a little fishy here and it is not clear to
> me at all what version have you built

The 9xxx sec timings were both building:

69af14a57c9e  - main - jemalloc: Note update in UPDATING and RELNOTES Warner Los

(the end of the jemalloc commit sequence).

One build was 69af14a57c9e in a chroot rebuilding itself.

The other built 69af14a57c9e via:

9a7c512a6149 - main - ucred groups: restore a useful comment Eric van Gyzen
(the just before jemalloc 5.3.0 related commits started)

The 2 chroots differ just by which jemalloc version
was in use.

> -- was the new jemalloc thing
> building new jemalloc and old jemalloc building old jemalloc? More
> imporantly I would be worried some of the build picks up whatever
> jemalloc it finds to use during some of the build.
> 
> I would benchmark this by building a big port (not timing dependencies
> of the port, just the port itself -- maybe even chromium or firefox).

Using qt6-webengine-6.9.3 would mean using a known
to have an issue context, at least for aarch64.

But I can not take weeks of time for such an activity.

amd64 is messier to compare official builds for
because of lack of uniformity across the builder
machines and each type of build being done on
its own builder machine: no examples of same
machine builds both.

> That's of course quite a bit of effort and if there is nobody to do
> that (or compatible), imo the pragmatic play is to revert the jemalloc
> update for the time being. This restores the known working state and
> should the update be a good thing it can land for 15.1, maybe fixed
> up.

150releng-arm64-quarterly on ampere3:
llvm21-21.1.2 : 21:26:14

143arm64-quarterly on ampere1:
llvm21-21.1.2 : 15:24:24

Again a notable time ratio. (default/latest would
not be a llvm version match.)

Some basic looking around does not suggest to me
that qt6-webengine-6.9.3 is somehow unique for
having notable timing ratios for quarterly on an
ampere* .

But, as of yet, I've no good evidence for blaming
jemalloc as a major contributor to those timing
ratios --or for blaming any other specific part
of 15.0 .


===
Mark Millard
marklmi at yahoo.com


Reply via email to