WenleiHe wrote:
Thanks for investigating, Matthias. So it looks like the regression is indeed
just clang non-PGO build being unlucky after the BFI improvement.. Sounds good
to leave it as is.
https://github.com/llvm/llvm-project/pull/66285
___
cfe-c
david-xl wrote:
> And digging even deeper:
>
> * FWIW I noticed that I only used `clang -c` as benchmark previously, should
> have used `clang -c -O3` resulting in this:
>
> ```
> Old-BFI: insn: 37,821,687,947 (baseline)
> New-BFI: insn: 38,133,312,923 +0.82%
> Old-BFI, no-
dexonsmith wrote:
Interesting. Probably `Value::getMetadata()` could/should call
`DenseMap::find()` instead of `operator[]()` and assert that it's found before
dereferencing, because `Value::hasMetadata()` (which, IIRC, consults a bit
stored in `Value`) has already promised something will be t
MatzeB wrote:
Some very ad-hoc benchmarking. Of clang compilation speed (measured in
instructions as reported by valgrind/callgrind which I think somewhat matches
the setup of nikic) compiling `sqlite3` of CTMark:
Old-BFI (this PR reverted), New-BFI (this PR applied), no-cold
(cold-callsite-r
vitalybuka wrote:
FYI @ldionne
This patch causing test_vector2.pass.cpp trigger memory leak
https://lab.llvm.org/buildbot/#/builders/168/builds/16422
Not sure if this was leaking but lsan didn't see, it's possible. Or some
regression in lsan.
https://github.com/llvm/llvm-project/pull/66285
_
david-xl wrote:
Contracting my past self (one of the reviewers of the patch), I think coldness
check should be based on a global threshold, which exists when synthetic entry
count propagation is enabled (without PGO).
https://github.com/llvm/llvm-project/pull/66285
dexonsmith wrote:
> Seems this got introduced in https://reviews.llvm.org/D34312 with the rough
> idea that we shouldn't inline into parts of the code that
> `_builtin_expect(...)` deems unlikely. Which makes sense when you express it
> like this, but I guess numeric thresholds can go wrong...
MatzeB wrote:
Seems this got introduced in https://reviews.llvm.org/D34312 with the rough
idea that we shouldn't inline into parts of the code that
`_builtin_expect(...)` deems unlikely. Which makes sense when you say it
express it like this, but I guess numeric thresholds can go wrong...
htt
david-xl wrote:
Agree that deciding coldness based on local entry frequency (2%) is a bad idea
though.
https://github.com/llvm/llvm-project/pull/66285
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listi
dexonsmith wrote:
Not entirely accidental. When BPI/BFI first landed it was heavily profiled to
be sure it didn't pessimize non-PGO code. I don't see why we'd suddenly be okay
with pessimizing it.
Under 2% isn't hard to hit for hot path code. Lots of functions will have
strings of early exit
david-xl wrote:
Yes -- the revert can wait until more data is available. I agree that it should
help performance in theory.
https://github.com/llvm/llvm-project/pull/66285
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/
WenleiHe wrote:
> This is concerning. Can this be reverted for now and we can help with some
> internal performance testing. @xur-llvm
Internally, with this change, on a few large workloads, we saw ~0.2-0.5%
performance improvements, all with 2-3% .text size reduction at the same time.
Our te
MatzeB wrote:
Oh so this isn't even a PGO enabled build, interesting...
https://github.com/llvm/llvm-project/pull/66285
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
nikic wrote:
This is a standard release build. The exact cmake config is
https://gist.github.com/nikic/1bf97b5b6aade298b87f41cd13bf37d5 where the stage1
build has the same configuration but is built with gcc. I'm testing CTMark, but
the impact is quite consistent across the board, so anything
nikic wrote:
It looks like this has made clang 0.7% slower (in the sense that the clang
stage2 binary is optimized worse -- there is no negative impact on stage1). The
binary is also 0.7% larger. It looks like this change has significant impact on
inlining heuristics.
Not really sure what to
15 matches
Mail list logo