[Bug target/119919] 7% exchange2 regression between g:6390fc86995fbd5239497cb9e1797a3af51d3936 and g:f72a2d221539cede358f2487b94bc370c6fc44b5

2025-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919 --- Comment #1 from Jan Hubicka --- There is also 4% tonto regression in Intel in the same range it seems https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=799.230.0

[Bug target/119919] New: 7% exchange2 regression between g:6390fc86995fbd5239497cb9e1797a3af51d3936 and g:f72a2d221539cede358f2487b94bc370c6fc44b5

2025-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919 Bug ID: 119919 Summary: 7% exchange2 regression between g:6390fc86995fbd5239497cb9e1797a3af51d3936 and g:f72a2d221539cede358f2487b94bc370c6fc44b5 Product: gcc Ve

[Bug tree-optimization/119902] New: open-coded scatter/gather should not account vec_to_scalar cost

2025-04-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119902 Bug ID: 119902 Summary: open-coded scatter/gather should not account vec_to_scalar cost Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal

[Bug target/119900] New: regression if imagick with -Ofast -march=native -fprofile-use between g:b986ed16c2546674 and g:e1098c7b08d9e601

2025-04-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900 Bug ID: 119900 Summary: regression if imagick with -Ofast -march=native -fprofile-use between g:b986ed16c2546674 and g:e1098c7b08d9e601 Product: gcc Version: 16.

[Bug target/119879] [16 Regression] FAIL: gcc.target/i386/avx512fp16-trunc-extendvnhf.c since r16-39

2025-04-21 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119879 --- Comment #2 from Jan Hubicka --- Created attachment 61166 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61166&action=edit Fix I am testing The fix I am testing. When VEC_PACK_TRUNC_EXPR is used, add_hook is called with vec_promote_dem

[Bug target/119879] [r16-39 Regression] FAIL: gcc.target/i386/avx512fp16-trunc-extendvnhf.c

2025-04-21 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119879 --- Comment #1 from Jan Hubicka --- The problem is in: /* VEC_PACK_TRUNC_EXPR: If inner size is greater than outer size we will end up doing two conversions and packing them. */ if (!scalar_p && inner_size > outer_size) { i

[Bug target/119876] New: suboptimal code for avx512 conditinal move

2025-04-20 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119876 Bug ID: 119876 Summary: suboptimal code for avx512 conditinal move Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: ta

[Bug tree-optimization/119875] New: loop with floating point conditional move not vectorized without -ffast-math

2025-04-20 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119875 Bug ID: 119875 Summary: loop with floating point conditional move not vectorized without -ffast-math Product: gcc Version: unknown Status: UNCONFIRMED Severity

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 Jan Hubicka changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #47 from Jan Hubicka --- Created attachment 61134 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61134&action=edit patch w/o forgotten debug output

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #46 from Jan Hubicka --- Created attachment 61133 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61133&action=edit updated patch The problem in previous patch was that ipa-prop streams 0 to the end of block of summary section

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #44 from Jan Hubicka --- Summaries are duplicated when clone is created. Let me debug why it gets lost here.

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #37 from Jan Hubicka --- Created attachment 61128 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61128&action=edit updated patch (regtests and bootstraps) Updated patch. Streaming summaries seems to work and fixes the testcase

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #36 from Jan Hubicka --- Created attachment 61127 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61127&action=edit patch (untested)

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #34 from Jan Hubicka --- I there is only problem that ipa_return_value_sum value sum does not survive from compile time to WPA then we only need to add streaming code for it. This should be straightforward and there is no need to add

[Bug target/105275] [12/13/14/15 regression] 525.x264_r and 538.imagick_r regressed on x86_64 at -O2 with PGO after r12-7319-g90d693bdc9d718

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275 --- Comment #6 from Jan Hubicka --- as discussed in PR111551 the SPEC train run does not include hottest loop of imagick (in ref loop), so we optimize it for size (in particular disable vectorization) and get poor performance

[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 --- Comment #7 from Jan Hubicka --- Details are in PR111551

[Bug gcov-profile/118551] Autofdo regressed 538.imagick_r by ~10% with -march=x86-64-v3 -O2

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118551 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #5

[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 --- Comment #6 from Jan Hubicka --- The problem is that the internal loop in hottest function changes between train and ref run (train run uses different variant of the loop). This disables vectorization of the loop believed to be cold causing -

[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298 --- Comment #15 from Jan Hubicka --- I made sily stand-alone test: long test[4]; __attribute__ ((noipa)) void foo (unsigned long a, unsigned long b, unsigned long c, unsigned long d) { test[0]=a; test[1]=b; test[2]=c;

[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298 --- Comment #14 from Jan Hubicka --- > > I am OK with using addss cost of 3 for trunk&release branches and make this > > more precise next stage1. > > That's what we use now? But I still don't understand why exactly > 538.imagick_r regresses

[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298 --- Comment #12 from Jan Hubicka --- > Btw, it was your r8-4018-gf6fd8f2bd4e9a9 which added the FP vs. non-FP > difference. Yep, I know. With that patch I mostly wanted to limit redundancy of the tables. The int/Fp difference was mostly based

[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

2025-04-08 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298 --- Comment #7 from Jan Hubicka --- Hmm, the sequence does not use + at all, but I think I know what is going on. While the field is called addss it is used as an kitchen sink for all other simple operations. /* pmuludq under sse2, pmuld

[Bug ipa/119147] 525.x264_r is approx. 10% slower with LTO+PGO than without (at -Ofast -march-native)

2025-04-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147 --- Comment #4 from Jan Hubicka --- Re-benchmarked current trunk -flto -Ofast -march=native (base) and -flto -Ofast -march=native + PGO (peak) on znver3 Estimated Estimated Base

[Bug ipa/119147] 525.x264_r is approx. 10% slower with LTO+PGO than without (at -Ofast -march-native)

2025-04-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147 --- Comment #3 from Jan Hubicka --- With speculation_useful_p we now are able to constant propagate stride into mc_chroma with PGO, but it does not help runtime. https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680055.html solves the costi

[Bug libstdc++/119606] [15 regression] Commit 'Optimize string constructor' causes regression in Snappy workload for -mcpu=neoverse-v2 with LTO

2025-04-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119606 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #5

[Bug target/119565] New: 13-17% regression of botan CAS128 and DES on zen4

2025-04-01 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119565 Bug ID: 119565 Summary: 13-17% regression of botan CAS128 and DES on zen4 Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component

[Bug target/119368] immintrin code running slower with gcc than clang

2025-03-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119368 --- Comment #5 from Jan Hubicka --- Thinking of it more, I think enabling memory alternatives in (define_insn "sse4_1_v4hiv4si2" [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v") (any_extend:V4SI (vec_select:V4HI (m

[Bug target/119368] immintrin code running slower with gcc than clang

2025-03-23 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119368 --- Comment #2 from Jan Hubicka --- On this combiner fails to match: Failed to match this instruction: (set (subreg:V4SI (reg:V2DI 101 [ ]) 0) (sign_extend:V4SI (vec_select:V4HI (mem:V8HI (reg:DI 106) [0 *x_3(D)+0 S16 A128]) (p

[Bug target/119368] New: immintrin code running slower with gcc than clang

2025-03-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119368 Bug ID: 119368 Summary: immintrin code running slower with gcc than clang Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component

[Bug ipa/119312] Constant array not allocated in read-only segment

2025-03-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119312 --- Comment #13 from Jan Hubicka --- And forgot to write. In case of strcmp I think we can use fnspec info we already have at the time constructing callgraph to represent it as a read rather than taking address. This would make things go bit sm

[Bug ipa/119312] Constant array not allocated in read-only segment

2025-03-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119312 --- Comment #12 from Jan Hubicka --- Indeed at IPA level we track if address of a symbol is taken, but we do not keep any extra info about how it may be used. It would be useful to track 1) if address is used only to read (to figure out readon

[Bug ipa/119147] 525.x264_r is approx. 10% slower with LTO+PGO than without (at -Ofast -march-native)

2025-03-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147 Jan Hubicka changed: What|Removed |Added Last reconfirmed||2025-03-13 Ever confirmed|0

[Bug c++/118924] [12/13/14/15 regression] Wrong code at -O2 and above leading to uninitialized accesses on aarch64-linux-gnu since r10-917-g3b47da42de621c

2025-03-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118924 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #1

[Bug middle-end/119147] New: 525.x264_r is approx. slower with LTO+PGO than without (at -Ofast -march-native)

2025-03-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147 Bug ID: 119147 Summary: 525.x264_r is approx. slower with LTO+PGO than without (at -Ofast -march-native) Product: gcc Version: unknown Status: UNCONFIRMED Seve

[Bug middle-end/111551] Fix for PR106081 is not working with profile feedback on imagemagick

2025-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111551 --- Comment #4 from Jan Hubicka --- >From gcov dump, the normal train run exercises loop: 742632: 2953: switch ( method ) { 742632: 2954:case ConvolveMorphology: -: 2955:/* Weighted Average of pixels using r

[Bug middle-end/111551] Fix for PR106081 is not working with profile feedback on imagemagick

2025-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111551 --- Comment #3 from Jan Hubicka --- With LTO the situation seems pretty much the same 21.23% imagick_r_peak. imagick_r_peak.trunk-pgolto-Ofast-native-m64 [.] MorphologyApply.cold 14.30% imagick_r_peak. imagick_r_peak.trunk-nop

[Bug middle-end/111551] Fix for PR106081 is not working with profile feedback on imagemagick

2025-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111551 Jan Hubicka changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed|

[Bug middle-end/119033] [13/14/15 regression] Unsafe FRE of pointer assignment since r13-469-g9a53101caadae1

2025-02-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119033 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #6

[Bug ipa/119006] [12/13/14/15 Regression] ICF merging pointer to array types which don't have the same bounds since r11-5181-g0862d007b564ec

2025-02-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119006 Jan Hubicka changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill

[Bug middle-end/119033] New: Unsafe FRE of pointer assignment

2025-02-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119033 Bug ID: 119033 Summary: Unsafe FRE of pointer assignment Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end

[Bug target/119010] [15 Regression] 444.namd shows a huge compile-time regression with -mtune=znver5

2025-02-25 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119010 Jan Hubicka changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at

[Bug ipa/118318] [15 regression] ICE when building firefox-134.0 with PGO

2025-02-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118318 --- Comment #13 from Jan Hubicka --- Thanks for running this through debugger Breakpoint 2.2, profile_count::operator+= (this=0x76e7e888, other=...) at /usr/src/debug/sys-devel/gcc-15.0./gcc-15.0./gcc/profile-count.h:932 932

[Bug tree-optimization/118527] When a loop is unlooped due to sccvn, its profile is not updated

2025-01-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118527 --- Comment #3 from Jan Hubicka --- The reason why I did not implement profile fixups to cfgcleanup is that you can not really fix the profile without knowing why it became inconsistent. Consider situation where we have function foo (int a) {

[Bug ipa/118318] ICE when building firefox-134.0 with PGO and LTO

2025-01-07 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118318 --- Comment #6 from Jan Hubicka --- Some profile inconsistencies are expected unless you use atomic counters since Firefox uses threads. Do you know why compatible_p returns false? It looks like mixing IPA and function local profiles together..

[Bug tree-optimization/90345] too pessimistic check whether pointer may alias a local variable

2024-12-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90345 Jan Hubicka changed: What|Removed |Added Last reconfirmed|2024-04-10 00:00:00 |2024-12-28 CC|

[Bug tree-optimization/80641] missed optimization with with std::vector resize in loop

2024-12-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80641 --- Comment #18 from Jan Hubicka --- With -O3 we now get: int main () { [local count: 114863531]: return 0; } -O2 offlines destructors which prevents us from optimizing away new() int main () { void * D.27676; int * c$_M_finish; int

[Bug target/80813] [12/13/14/15 Regression] x86: std::vector::operator[] could be somewhat faster using BT instead of SHL

2024-12-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80813 --- Comment #6 from Jan Hubicka --- Patch to optimize operator[] to be again branchless posted https://gcc.gnu.org/pipermail/gcc-patches/2024-December/672286.html Main problem with auto-generating bt is that it needs change of conditional from C

[Bug tree-optimization/26388] Variable sized storage allocation should be promoted to stack allocation

2024-12-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26388 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #20

[Bug tree-optimization/117638] No loop splitting and bounds check not optimized out with -D_GLIBCXX_ASSERTIONS

2024-12-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117638 --- Comment #4 from Jan Hubicka --- Both with assertions or without we offline _M_default_append which would be better inlined. It is because main is known to be called once. One difference is that non-assertion clobbers the vectors prior const

[Bug c++/86276] Poor codegen when returning a std::vector

2024-12-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86276 --- Comment #2 from Jan Hubicka --- With -O3 we now do quite well. _Z4goodv: .LFB1248: .cfi_startproc ret .cfi_endproc .LFE1248: .size _Z4goodv, .-_Z4goodv .p2align 4 .globl _Z3badv .typ

[Bug tree-optimization/117639] Modified loop-split-1.C doesn't recognise non-escaping std::vector

2024-12-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117639 --- Comment #3 from Jan Hubicka --- With -O3 -std=c++20 https://godbolt.org/z/3WKnn8rax we inline but still get stuck on loop calling log and modifying errno. Without -std=c++20 we reach --param max-inline-insns-auto. We need --param max-inlin

[Bug target/80813] [12/13/14/15 Regression] x86: std::vector::operator[] could be somewhat faster using BT instead of SHL

2024-12-20 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80813 --- Comment #5 from Jan Hubicka --- Combine constructs: (set (reg:CCZ 17 flags) (compare:CCZ (zero_extract:DI (mem:DI (plus:DI (mult:DI (reg:DI 111 [ _8 ]) (const_int 8 [0x8])) (reg/f:DI 112 [ v_2(

[Bug target/80813] [12/13/14/15 Regression] x86: std::vector::operator[] could be somewhat faster using BT instead of SHL

2024-12-20 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80813 --- Comment #4 from Jan Hubicka --- Bit_reference constructor takes mask and not bit position. _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR reference operator[](size_type __n) { __glibcxx_requires_subscript(__n);

[Bug target/80813] [12/13/14/15 Regression] x86: std::vector::operator[] could be somewhat faster using BT instead of SHL

2024-12-20 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80813 --- Comment #3 from Jan Hubicka --- OK, so the horrid codegen is because bvector's [] operator is imlemented using iterator: return begin()[__n]; iterator's [] operator is implemented using: _GLIBCXX20_CONSTEXPR void _M_incr(ptrdif

[Bug target/80813] [12/13/14/15 Regression] x86: std::vector::operator[] could be somewhat faster using BT instead of SHL

2024-12-20 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80813 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |NEW Summary|x86:

[Bug tree-optimization/109440] Missed optimization of vector::at when a function is called inside the loop

2024-12-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109440 --- Comment #3 from Jan Hubicka --- I believe that since v is constructed and passed by invisible refernece in the caller, we would need to know constructors of std::vector and prove that they do not make &v to escape to global memory, so foo ca

[Bug libstdc++/90436] Redundant size checking in vector

2024-12-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90436 Jan Hubicka changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible

2024-12-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821 --- Comment #14 from Jan Hubicka --- Jonathan, is there some problem with your patch?

[Bug ipa/110378] IPA-SRA for destructors

2024-12-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110378 --- Comment #10 from Jan Hubicka --- Martin, I think this is fixed?

[Bug middle-end/109849] suboptimal code for vector walking loop

2024-12-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849 Bug 109849 depends on bug 110287, which changed state. Bug 110287 Summary: _M_check_len is expensive https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287 What|Removed |Added -

[Bug libstdc++/110287] _M_check_len is expensive

2024-12-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287 Jan Hubicka changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug c++/118130] New: std::vector code quality issues

2024-12-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118130 Bug ID: 118130 Summary: std::vector code quality issues Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ As

[Bug c++/97094] Compiling big std::unordered_map became slower

2024-12-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97094 Jan Hubicka changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed|

[Bug tree-optimization/86701] Optimize strlen called on std::string c_str()

2024-12-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86701 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #5

[Bug libstdc++/60621] std::vector::emplace_back generates massively more code than push_back

2024-12-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60621 Jan Hubicka changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED

[Bug tree-optimization/117924] unused std::vector are not optimized out fully at gimple level

2024-12-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117924 Jan Hubicka changed: What|Removed |Added Last reconfirmed||2024-12-14 Ever confirmed|0

[Bug libstdc++/87502] Poor code generation for std::string("c-style string")

2024-12-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87502 --- Comment #16 from Jan Hubicka --- https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671599.html optimizes the string constructors. Having strlen pass catching more cases would be nice, too.

[Bug libstdc++/80331] unused const std::string not optimized away

2024-12-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80331 Jan Hubicka changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill

[Bug libstdc++/80331] unused const std::string not optimized away

2024-12-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80331 --- Comment #15 from Jan Hubicka --- Original testcase is solved by https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671599.html We still won't optimize longer strings because _M_create is not inline.

[Bug c++/103827] function which takes an argument via (hidden) reference should assume the argument does not escape or is only read from

2024-12-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103827 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0

[Bug c++/94960] extern template prevents inlining of standard library objects

2024-12-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94960 --- Comment #10 from Jan Hubicka --- Note that passing function body to middle-end does not only enable inlining, but other optimizations too. Often ipa-modref is able to summarize side effects of the function and enables more optimization, since

[Bug libstdc++/109442] Dead local copy of std::vector not removed from function

2024-12-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109442 --- Comment #35 from Jan Hubicka --- On #include bool test1(const std::vector& in) { return in == std::vector{42}; } we produce: bool test1 (const struct vector & in) { bool _12; int * _13; int * _14; long int _24; unsigned in

[Bug ipa/93921] -Os generates much bigger code than -O{1,2,3,fast} for std::string::size

2024-12-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93921 Jan Hubicka changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill

[Bug libstdc++/80331] unused const std::string not optimized away

2024-12-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80331 --- Comment #14 from Jan Hubicka --- Declaring _S_create and _M_create inline indeed helps a little: diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h index 17b973c8b45..d73a61abe5b 100644 --- a/lib

[Bug tree-optimization/117875] [15 Regression] 28% regression for 456.hmmer on Zen4 with -Ofast -march=native

2024-12-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117875 --- Comment #12 from Jan Hubicka --- I tried final_value_replacement_loop on simplified testcase where second loop has known number of iterations: void foo(int *a, int *b, int n) { if (n > 3 && n < 10) for (int i = 0; i

[Bug ipa/86590] Codegen is poor when passing std::string by value with _GLIBCXX_EXTERN_TEMPLATE undefined

2024-12-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86590 --- Comment #40 from Jan Hubicka --- As discussed with Jason, problem with _M_create not seen by middle-end is actually due to C++ standard. Explicit instantiations prevents implicit ones for non-inline functions, see discussion in PR39242. With

[Bug libstdc++/80331] unused const std::string not optimized away

2024-12-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80331 --- Comment #13 from Jan Hubicka --- As discussed with Jason, problem with _M_create not seen by middle-end is actually due to C++ standard. Explicit instantiations prevents implicit ones for non-inline functions, see discussion in PR39242. With

[Bug ipa/117984] New: missed IPA constant propagation

2024-12-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117984 Bug ID: 117984 Summary: missed IPA constant propagation Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa As

[Bug c++/103827] function which takes an argument via (hidden) reference should assume the argument does not escape or is only read from

2024-12-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103827 --- Comment #11 from Jan Hubicka --- I see, I misread Jonathan's answer. If const is relevant only on definition, what about this one: #include struct foo { int a; void bar() const; ~foo() { if (a != 42) printf ("optimize me

[Bug c++/103827] function which takes an argument via (hidden) reference should assume the argument does not escape or is only read from

2024-12-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103827 --- Comment #9 from Jan Hubicka --- Jason, did you intend to close this as invalid? I think we agreed on the original testcase being valid - we can assume that calls to extern void foo (const std::string ); can assume the string argument being

[Bug c++/103827] function which takes an argument via (hidden) reference should assume the argument does not escape or is only read from

2024-12-08 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103827 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org,

[Bug libstdc++/87502] Poor code generation for std::string("c-style string")

2024-12-08 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87502 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org Ever confir

[Bug tree-optimization/117957] New: vectorization pesimises std::vector push/pop test

2024-12-08 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117957 Bug ID: 117957 Summary: vectorization pesimises std::vector push/pop test Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Compon

[Bug tree-optimization/117924] unused std::vector are not optimized out fully at gimple level

2024-12-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117924 --- Comment #1 from Jan Hubicka --- looking at dse3 dump we get: [local count: 1073741824]: MEM[(struct _Bvector_impl_data *)&data] ={v} {CLOBBER(bob)}; MEM[(struct __as_base &)&data] ={v} {CLOBBER(bob)}; _13 = MEM[(const struct vecto

[Bug tree-optimization/117924] New: unused std::vector are not optimized out fully at gimple level

2024-12-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117924 Bug ID: 117924 Summary: unused std::vector are not optimized out fully at gimple level Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal

[Bug tree-optimization/117875] [15 Regression] 28% regression for 456.hmmer on Zen4 with -Ofast -march=native

2024-12-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117875 Jan Hubicka changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED

[Bug ipa/117892] [15 Regression] ICE on valid code at -O1 and above on x86_64-linux-gnu: in single_succ_edge, at basic-block.h:332 since r15-5336-gcee7d080d5c2a5

2024-12-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117892 --- Comment #2 from Jan Hubicka --- This is mine. The loop first checks that basic block is empty (consits only of debug statements, predicts, clobbers and nops) and then it asserts that there is only one edge out, which ought to be the case. I

[Bug ipa/86590] Codegen is poor when passing std::string by value with _GLIBCXX_EXTERN_TEMPLATE undefined

2024-11-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86590 Jan Hubicka changed: What|Removed |Added CC||mjambor at suse dot cz --- Comment #36 fro

[Bug tree-optimization/79349] unused std::string is not optimized away in presense of a call

2024-11-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79349 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #6

[Bug libstdc++/80331] unused const std::string not optimized away

2024-11-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80331 --- Comment #12 from Jan Hubicka --- I think with my patch to basic_string we should have at least arrived to something comparable with clang. With -O2 it is optimized away, with -O2 -D_GLIBCXX_USE_CXX11_ABI=0 I get: int sain () { struct alloc

[Bug tree-optimization/117793] New: missed copy propagation across memcpy

2024-11-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117793 Bug ID: 117793 Summary: missed copy propagation across memcpy Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-op

[Bug ipa/113197] [12/13/14 Regression] ICE in in handle_call_arg, at tree-ssa-structalias.cc:4119 since r12-5177-g494bdadf28d0fb

2024-11-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113197 Jan Hubicka changed: What|Removed |Added Assignee|rguenth at gcc dot gnu.org |hubicka at gcc dot gnu.org --- Co

[Bug tree-optimization/117489] [12/13/14/15 Regression] ICE in handle_call_arg, at tree-ssa-structalias.cc:4226 at -O1 and above with "-fno-ipa-pure-const -fsanitize=undefined" and pure and no sanitiz

2024-11-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117489 --- Comment #7 from Jan Hubicka --- The problem here (and with the assert Richi nuked in PR 113197) is that the flags really are: no_direct_clobber no_indirect_clobber no_direct_escape no_indirect_escape not_returned_indirectly no_direct_read n

[Bug tree-optimization/117764] [15 Regression] cddce should handle __builtin_unreachable guards

2024-11-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117764 Jan Hubicka changed: What|Removed |Added Summary|cddce should handle |[15 Regression] cddce |

[Bug tree-optimization/117764] New: cddce should handle __builtin_unreachable guards

2024-11-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117764 Bug ID: 117764 Summary: cddce should handle __builtin_unreachable guards Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Compone

[Bug tree-optimization/117710] New: repeated calls to std::function are not inlined

2024-11-20 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117710 Bug ID: 117710 Summary: repeated calls to std::function are not inlined Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Componen

[Bug ipa/117672] Remove unused virtual methods

2024-11-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117672 --- Comment #4 from Jan Hubicka --- There is constructor of the static variable (_GLOBAL__sub_I_main) which we do not optimize out since we think it makes useful memory writes since at that stage we do not know that static var is effectively wri

[Bug tree-optimization/117639] Modified loop-split-1.C doesn't recognise non-escaping std::vector

2024-11-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117639 Jan Hubicka changed: What|Removed |Added CC||jwakely at redhat dot com,

[Bug tree-optimization/58483] missing optimization opportunity for const std::vector compared to std::array

2024-11-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58483 --- Comment #23 from Jan Hubicka --- one in comment #0 optimizes with me provided that destructors are inline jh@shroud:/tmp> cat tt.C #include #include #include //static int calc(const std::array p_ints, const int& p_init) static int calc(c

  1   2   3   4   5   6   7   8   9   >