[clang] Bfi precision (PR #66285)

2023-10-30 Thread via cfe-commits
WenleiHe wrote: Thanks for investigating, Matthias. So it looks like the regression is indeed just clang non-PGO build being unlucky after the BFI improvement.. Sounds good to leave it as is. https://github.com/llvm/llvm-project/pull/66285 ___ cfe-c

[clang] Bfi precision (PR #66285)

2023-10-28 Thread David Li via cfe-commits
david-xl wrote: > And digging even deeper: > > * FWIW I noticed that I only used `clang -c` as benchmark previously, should > have used `clang -c -O3` resulting in this: > > ``` > Old-BFI: insn: 37,821,687,947 (baseline) > New-BFI: insn: 38,133,312,923 +0.82% > Old-BFI, no-

[clang] Bfi precision (PR #66285)

2023-10-28 Thread Duncan P . N . Exon Smith via cfe-commits
dexonsmith wrote: Interesting. Probably `Value::getMetadata()` could/should call `DenseMap::find()` instead of `operator[]()` and assert that it's found before dereferencing, because `Value::hasMetadata()` (which, IIRC, consults a bit stored in `Value`) has already promised something will be t

[clang] Bfi precision (PR #66285)

2023-10-27 Thread Matthias Braun via cfe-commits
MatzeB wrote: Some very ad-hoc benchmarking. Of clang compilation speed (measured in instructions as reported by valgrind/callgrind which I think somewhat matches the setup of nikic) compiling `sqlite3` of CTMark: Old-BFI (this PR reverted), New-BFI (this PR applied), no-cold (cold-callsite-r

[clang] Bfi precision (PR #66285)

2023-10-26 Thread Vitaly Buka via cfe-commits
vitalybuka wrote: FYI @ldionne This patch causing test_vector2.pass.cpp trigger memory leak https://lab.llvm.org/buildbot/#/builders/168/builds/16422 Not sure if this was leaking but lsan didn't see, it's possible. Or some regression in lsan. https://github.com/llvm/llvm-project/pull/66285 _

[clang] Bfi precision (PR #66285)

2023-10-26 Thread David Li via cfe-commits
david-xl wrote: Contracting my past self (one of the reviewers of the patch), I think coldness check should be based on a global threshold, which exists when synthetic entry count propagation is enabled (without PGO). https://github.com/llvm/llvm-project/pull/66285

[clang] Bfi precision (PR #66285)

2023-10-26 Thread Duncan P . N . Exon Smith via cfe-commits
dexonsmith wrote: > Seems this got introduced in https://reviews.llvm.org/D34312 with the rough > idea that we shouldn't inline into parts of the code that > `_builtin_expect(...)` deems unlikely. Which makes sense when you express it > like this, but I guess numeric thresholds can go wrong...

[clang] Bfi precision (PR #66285)

2023-10-26 Thread Matthias Braun via cfe-commits
MatzeB wrote: Seems this got introduced in https://reviews.llvm.org/D34312 with the rough idea that we shouldn't inline into parts of the code that `_builtin_expect(...)` deems unlikely. Which makes sense when you say it express it like this, but I guess numeric thresholds can go wrong... htt

[clang] Bfi precision (PR #66285)

2023-10-26 Thread David Li via cfe-commits
david-xl wrote: Agree that deciding coldness based on local entry frequency (2%) is a bad idea though. https://github.com/llvm/llvm-project/pull/66285 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listi

[clang] Bfi precision (PR #66285)

2023-10-26 Thread Duncan P . N . Exon Smith via cfe-commits
dexonsmith wrote: Not entirely accidental. When BPI/BFI first landed it was heavily profiled to be sure it didn't pessimize non-PGO code. I don't see why we'd suddenly be okay with pessimizing it. Under 2% isn't hard to hit for hot path code. Lots of functions will have strings of early exit

[clang] Bfi precision (PR #66285)

2023-10-25 Thread David Li via cfe-commits
david-xl wrote: Yes -- the revert can wait until more data is available. I agree that it should help performance in theory. https://github.com/llvm/llvm-project/pull/66285 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/

[clang] Bfi precision (PR #66285)

2023-10-25 Thread via cfe-commits
WenleiHe wrote: > This is concerning. Can this be reverted for now and we can help with some > internal performance testing. @xur-llvm Internally, with this change, on a few large workloads, we saw ~0.2-0.5% performance improvements, all with 2-3% .text size reduction at the same time. Our te

[clang] Bfi precision (PR #66285)

2023-10-25 Thread Matthias Braun via cfe-commits
MatzeB wrote: Oh so this isn't even a PGO enabled build, interesting... https://github.com/llvm/llvm-project/pull/66285 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Bfi precision (PR #66285)

2023-10-25 Thread Nikita Popov via cfe-commits
nikic wrote: This is a standard release build. The exact cmake config is https://gist.github.com/nikic/1bf97b5b6aade298b87f41cd13bf37d5 where the stage1 build has the same configuration but is built with gcc. I'm testing CTMark, but the impact is quite consistent across the board, so anything

[clang] Bfi precision (PR #66285)

2023-10-25 Thread Nikita Popov via cfe-commits
nikic wrote: It looks like this has made clang 0.7% slower (in the sense that the clang stage2 binary is optimized worse -- there is no negative impact on stage1). The binary is also 0.7% larger. It looks like this change has significant impact on inlining heuristics. Not really sure what to