from:"Jan Hubicka via Gcc\-bugs"

Re: [Bug rtl-optimization/98782] [11/12 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

2022-01-11 Thread Jan Hubicka via Gcc-bugs

on zen2 and 3 with -flto the speedup seems to be cca 12% for both -O2 and -Ofast -march=native which is both very nice! Zen1 for some reason sees less improvement, about 6%. With PGO it is 3.8% Overall it seems a win, but there are few noteworthy issues. I also see a 6.69% regression on x64 with

Re: [Bug tree-optimization/103989] [12 regression] std::optional and bogus -Wmaybe-unitialized at -Og since r12-1992-g6feb628a706e86eb

2022-01-13 Thread Jan Hubicka via Gcc-bugs

> --- Comment #6 from Richard Biener --- > Honza, -Og was supposed to not do so much work, I intended to disable IPA > inlining but there's no knob for that. I wonder where to best put such > guard? I set flag_inline_small_functions to zero for -Og but we still > run inline_small_functions ().

Re: [Bug tree-optimization/103989] [12 regression] std::optional and bogus -Wmaybe-unitialized at -Og since r12-1992-g6feb628a706e86eb

2022-01-13 Thread Jan Hubicka via Gcc-bugs

> You can not disable an IPA pass becasuse then we will mishandle > optimize attributes. I think you simply want to set > > flag_inline_small_functions = 0 > flag_inline_functions_called_once = 0 Actually I forgot, we have flag_no_inline which makes tree_inlinable_function_p to return false for

Re: [Bug tree-optimization/103989] [12 regression] std::optional and bogus -Wmaybe-unitialized at -Og since r12-1992-g6feb628a706e86eb

2022-01-13 Thread Jan Hubicka via Gcc-bugs

> > Sure - I just remember (falsely?) that we finally decided to do it :) I do not recall this, but I may have forgotten :)) > If we don't run IPA inline we don't figure we failed to inline the > always_inline either ;) And IPA inline can expose more indirect > alywas-inlines we only discover a

Re: [Bug tree-optimization/103195] [12 Regression] tfft2 text grows by 70% with -Ofast since r12-5113-gd70ef65692fced7a

2022-01-18 Thread Jan Hubicka via Gcc-bugs

> So nothing to see? I guess our unit growth limit doesn't trigger because it's > a small (benchmark) unit? Yep, unit growths do not apply for very small units. ipa-cp heuristics still IMO needs work and be based on relative speedups rather then absolute for the cutoffs.

Re: [Bug ipa/104203] [12 Regressions] huge IPA compile-time regression since r12-6606-g9d6a0f388eb048f8

2022-01-24 Thread Jan Hubicka via Gcc-bugs

So I assume that this is due to new pass_waccess which was added into early optimizations. I think this is not really ipa component but tree-optimize.

Re: [Bug tree-optimization/104203] [12 Regressions] huge compile-time regression since r12-6606-g9d6a0f388eb048f8

2022-01-24 Thread Jan Hubicka via Gcc-bugs

> > bool > Since the pass issues a bunch other warnings (e.g., -Wstringop-overflow, > -Wuse-after-free, etc.) the gate doesn't seem right. But since #pragma GCC > diagnostic can re-enable warnings disabled by -w (or turn them into errors) > any > gate that considers the global option setting will

Re: [Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread Jan Hubicka via Gcc-bugs

> > According to znver2_cost > > > > Cost of sse_to_integer is a little bit less than fp_store, maybe increase > > sse_to_integer cost(more than fp_store) can helps RA to choose memory > > instead of GPR. > > That sounds reasonable - GPR<->xmm is cheaper than GPR -> stack -> xmm > but GPR<->xmm s

Re: [Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread Jan Hubicka via Gcc-bugs

> I would say so. It saves code size and also uop space unless the two > can magically fuse to a immediate to %xmm move (I doubt that). I made simple benchmark double a=10; int main() { long int i; double sum,val1,val2,val3,val4; for (i=0;i<10;i++) { #if

Re: [Bug d/103040] [12 Regression] gdc.dg/torture/pr101273.d FAILs

2021-11-02 Thread Jan Hubicka via Gcc-bugs

> See above comments from Iain, even if that pre-initialization is removed it is > still miscompiled. And, the testcase fails not because of the padding bits > not > being zero, but because the address of self stored into one of the fields > isn't > there or modref thinks it can't be changed or

Re: [Bug d/103040] [12 Regression] gdc.dg/torture/pr101273.d FAILs

2021-11-02 Thread Jan Hubicka via Gcc-bugs

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103040 > > --- Comment #15 from Iain Buclaw --- > Got it. The difference between D and C++ is a matter of early inlining. > > The C++ example Jakub posted fails in the same way that D does if you compile > with: -O1 -fno-inline Great, I will take a

Re: [Bug tree-optimization/102943] [12 Regression] Jump threader compile-time hog with 521.wrf_r

2021-11-04 Thread Jan Hubicka via Gcc-bugs

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102943 > > Aldy Hernandez changed: > >What|Removed |Added > > Depends on||103058 > > --- Comment #

Re: [Bug tree-optimization/102943] [12 Regression] Jump threader compile-time hog with 521.wrf_r

2021-11-07 Thread Jan Hubicka via Gcc-bugs

> > This PR is still open, at least for slowdown in the threader with LTO. The > issue is ranger wide, so it may also cause slowdowns on non-LTO builds for > WRF, though I haven't checked. I just wanted to record the fact somewhere since I was looking up the revision range mostly to figure out i

Re: [Bug middle-end/102997] [12 Regression] 45% 454.calculix regression with LTO+PGO -march=native -Ofast on Zen since r12-4526-gd8edfadfc7a9795b65177a50ce44fd348858e844

2021-11-08 Thread Jan Hubicka via Gcc-bugs

Note that it still seems to me that the crossed_loop_header handling is overly conservative. We have: @ -2771,6 +2771,7 @@ jt_path_registry::cancel_invalid_paths (vec &path) bool seen_latch = false; int loops_crossed = 0; bool crossed_latch = false; + bool crossed_loop_header = false;

Re: [Bug tree-optimization/103175] [12 Regression] internal compiler error: in handle_call_arg, at tree-ssa-structalias.c:4139

2021-11-11 Thread Jan Hubicka via Gcc-bugs

The sanity check verifies that functions acessing parameter indirectly also reads the parameter (otherwise the indirect reference can not happen). This patch moves the check earlier and removes some overactive flag cleaning on function call boundary which introduces the non-sential situation. I g

Re: [Bug ipa/103211] [12 Regression] 416.gamess crashes after r12-5177-g494bdadf28d0fb35

2021-11-12 Thread Jan Hubicka via Gcc-bugs

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103211 > > --- Comment #2 from Martin Liška --- > Optimized dump differs for couple of functions in the same way: > > diff -u good bad > --- good2021-11-12 17:42:36.995947103 +0100 > +++ bad 2021-11-12 17:41:56.728194961 +0100 > @@ -38,7 +38

Re: [Bug ipa/103230] New: ipa-modref-tree.h:550:33: runtime error: load of value 255, which is not a valid value for type 'bool'

2021-11-14 Thread Jan Hubicka via Gcc-bugs

> Happens with UBSAN compiler for: > > $ gcc gcc/testsuite/gcc.c-torture/execute/pr71494.c -O1 -flto > ... > /home/marxin/Programming/gcc/gcc/ipa-modref-tree.h:550:33: runtime error: load > of value 255, which is not a valid value for type 'bool' > #0 0x18acc38 in modref_tree::merge(modref_tr

Re: [Bug ipa/103230] ipa-modref-tree.h:550:33: runtime error: load of value 255, which is not a valid value for type 'bool'

2021-11-14 Thread Jan Hubicka via Gcc-bugs

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103230 > > --- Comment #2 from Martin Liška --- > > How do you build ubsan compiler? > > F="-O0 -g -fsanitize=undefined" ; make -j16 all-host -k CFLAGS="$F" > CXXFLAGS="$F" LDFLAGS="$F" > > is the fastest approach. Thanks, it is similar to what I

Re: [Bug tree-optimization/103231] New: ICE (nondeterministic) on valid code at -O1 on x86_64-linux-gnu: Segmentation fault

2021-11-14 Thread Jan Hubicka via Gcc-bugs

> [659] % > [659] % gcctk -O0 -w small.c > [660] % > [660] % gcctk -O1 -w small.c > [661] % gcctk -O1 -w small.c > [662] % gcctk -O1 -w small.c > gcctk: internal compiler error: Segmentation fault signal terminated program > cc1 > Please submit a full bug report, > with preprocessed source if app

Re: [Bug ipa/103267] Wrong code with ipa-sra

2021-11-16 Thread Jan Hubicka via Gcc-bugs

Works for me even with the 3 warnings. hubicka@lomikamen:/aux/hubicka/trunk/build-lto2/gcc$ cat >tt.c __attribute__ ((noinline,const)) infinite (int p) { if (p) while (1); return p; } __attribute__ ((noinline)) static void test(int p, int *a) { int v = infinite (p); if (*a && v) __

Re: [Bug ipa/103267] Wrong code with ipa-sra

2021-11-16 Thread Jan Hubicka via Gcc-bugs

Aha, but here is better example (reproduces same way). In the former one I forgot const attribute which makes it invalid. The testcase tests that ipa-sra is missing ECF_LOOPING_CONST_OR_PURE check static int __attribute__ ((noinline)) infinite (int p) { if (p) while (1); return p; } __attr

Re: [Bug ipa/103267] Wrong code with ipa-sra

2021-11-16 Thread Jan Hubicka via Gcc-bugs

> @@ -1,4 +1,3 @@ > -static int > __attribute__ ((noinline,const)) > infinite (int p) > { Just for a record, it crahes with or without static int here for me :) I run across it because the code tracking must access in ipa-sra is IMO conceptually wrong. I noticed that because ipa-modref solves

Re: [Bug tree-optimization/103300] New: wrong code at -O3 on x86_64-linux-gnu

2021-11-17 Thread Jan Hubicka via Gcc-bugs

Needs -O2 -floop-unroll-and-jam --param early-inlining-insns=14 to fail, so I guess it may be issue with unrol-and-jam.

Re: [Bug driver/100937] configure: Add --enable-default-semantic-interposition

2021-11-22 Thread Jan Hubicka via Gcc-bugs

> (The -fno-semantic-interposition thing is probably the biggest performance gap > between gcc -fpic and clang -fpic.) Yep, it is often confusing to users (who do not understand what ELF interposition is) that clang and gcc disagree on default flags here. Recently -Ofast was extended to imply -fno-

Re: [Bug tree-optimization/103168] Value numbering for PRE of pure functions can be improved

2021-11-22 Thread Jan Hubicka via Gcc-bugs

This is bit modified patch I am testing. I added pre-computation of the number of accesses, enabled the path for const functions (in case they have memory operand), initialized alias sets and clarified the logic around every_* and global_memory_accesses PR tree-optimization/103168

Re: [Bug tree-optimization/103168] Value numbering for PRE of pure functions can be improved

2021-11-22 Thread Jan Hubicka via Gcc-bugs

The patch passed testing on x86_64-linux.

Re: [Bug gcov-profile/103652] Producing profile with -O2 -flto and trying to consume it with -O3 -flto leads to ICEs on indirect call profiling

2021-12-13 Thread Jan Hubicka via Gcc-bugs

> > Well, I'm specifically speaking about: > error: the control flow of function ‘BZ2_compressBlock’ does not match its > profile data (counter ‘arcs’) > > this type of errors should not happen even in a multi-threaded programs. There are some cases where I see even those on clang build - I am

Re: [Bug ipa/102982] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2021-10-28 Thread Jan Hubicka via Gcc-bugs

> > fixup_cfg already removes write-only stores so that seems fit for that > purpose. > > Btw, > > static int x = 1; > > int main() > { > x = 1; > } > > should ideally be handled as well as maybe the more common(?) > > static int x[128]; > > int main() > { > memset (x, 0, 128*4); > } >

Re: [Bug middle-end/102997] [12 Regression] 45% 454.calculix regression with LTO+PGO -march=native -Ofast between ce4d1f632ff3f680550d3b186b60176022f41190 and 6fca1761a16c68740f875fc487b98b6bde8e9be7

2021-10-29 Thread Jan Hubicka via Gcc-bugs

> Not seen on Haswell (but w/o PGO). Is this PGO specific? There's another > large jump visible end of 2019. It is between 2019-11-15 and 18 but the revisions does not exist at git - perhaps they reffer to the old git mirror. Martin will know better. In that range there are many of Richard's vec

Re: [Bug target/87832] AMD pipeline models are very costly size-wise

2022-11-16 Thread Jan Hubicka via Gcc-bugs

> > Do you mean we should fix modeling of divisions there as well? I don't have > latency/throughput measurements for those CPUs, nor access so I can run > experiments myself, unfortunately. > > I guess you mean just making a patch to model division units separately, > leaving latency/throughput

Re: [Bug c/105728] New: dead store to static var not optimized out

2022-05-25 Thread Jan Hubicka via Gcc-bugs

> To me, all of these do the same thing and should generate the same code. > As nobody else can see removeme, and we aren't leaking its address, shouldn't > the compiler be able to deduce that all accesses to removeme are > inconsequential and can be removed? > > My gcc 11.3 generates a condidion

Re: [Bug lto/105727] __builtin_constant_p expansion in LTO

2022-05-25 Thread Jan Hubicka via Gcc-bugs

> > My guess is that the > > BUILD_BUG(); > > line is the sole thing that is wrong, it should be just break; > > as the memory_is_poisoned_n(addr, size); will handle all the sizes, > > regardless if they are constants or not. > > Sure, I'm going to suggest such a change. To me it looked like a pro

Re: [Bug middle-end/106078] Invalid loop invariant motion with non-call-exceptions

2022-06-25 Thread Jan Hubicka via Gcc-bugs

> > For this one it's PRE hoisting *b across the endless loop (PRE handles > > calls as possibly not returning but not loops as possibly not > > terminating...) > > So it's a different bug. > > Btw, C++ requiring forward progress makes the testcase undefined. In my understanding access to volatil

Re: [Bug tree-optimization/113787] [12/13/14 Regression] Wrong code at -O with ipa-modref on aarch64

2024-02-14 Thread Jan Hubicka via Gcc-bugs

> > I guess PTA gets around by tracking points-to set also for non-pointer > > types and consequently it also gives up on any such addition. > > It does. But note it does _not_ for POINTER_PLUS where it treats > the offset operand as non-pointer. > > > I think it is ipa-prop.c::unadjusted_ptr_an

Re: [Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread Jan Hubicka via Gcc-bugs

Looking at the prototype patch, why need to change also the splitters? My original goal was to use splitters to expand to faster code sequences while having patterns necessary for both variants. This makes it possible to use optimize_insn_for_size/speed and make decisions using BB profile, since

Re: [Bug ipa/114262] Over-inlining when optimizing for size with gnu_inline function

2024-03-07 Thread Jan Hubicka via Gcc-bugs

> Note GCC has not retuned its -Os heurstics for a long time because it has been > decent enough for most folks and corner cases like this is almost never come > up. There were quite few changes to -Os heuristics :) One of bigger challenges is that we do see more and more C++ code built with -Os wh

Re: [Bug target/110758] [14 Regression] 8% hmmer regression on zen1/3 with -Ofast -march=native -flto between g:8377cf1bf41a0a9d (2023-07-05 01:46) and g:3a61ca1b9256535e (2023-07-06 16:56); g:d76d19c

2023-07-21 Thread Jan Hubicka via Gcc-bugs

> I suspect this is most likely the profile updates changes ... Quite possibly. The goal of this excercise is to figure out if there are some bugs in profile estimate or whether passes somehow preffer broken profile or if it is just back luck. Looking at sphinx and fatigue it seems that LRA really

Re: [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022

2023-07-28 Thread Jan Hubicka via Gcc-bugs

> This heuristic wants to catch > > > if (foo) abort (); > > > and avoid sinking "too far" across a path with "similar enough" > execution count (I think the original motivation was to fix some > spilling / register pressure issue). The loop depth test > should be !(bb_loop_depth (best_b

Re: [Predicated Ins vs Branches] O3 and PGO result in 2x performance drop relative to O2

2023-08-01 Thread Jan Hubicka via Gcc-bugs

> > If I comment it out as above patch, then O3/PGO can get 16% and 12% > > performance > > improvement compared to O2 on x86. > > > > O2 O3 PGO > > cycles 2,497,674,824 2,104,993,224 2,199,753,593 > > instructions1

Re: [Bug middle-end/111088] useless 'xor eax,eax' inserted when a value is not returned and icf

2023-08-21 Thread Jan Hubicka via Gcc-bugs

> But adds a return with a value. And then the inliner inlines foo into foo2 but > we still have the return with a value around ... I guess ICF can special case unused return value, but why this is not taken care of by ipa-sra?

Re: [Bug c++/106943] GCC building clang/llvm with LTO flags causes ICE in clang

2023-05-12 Thread Jan Hubicka via Gcc-bugs

> > Indeed it is quite long time problem with clang not building with lifetime > > DSE and strict aliasing. I wonder why this is not fixed on clang side? > > Because the problems were not communicated? I knew that Firefox needed > -flifetime-dse=1, but it's the first time I hear that any such pro

Re: [Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled on x86 since r14-5109-ga291237b628f41

2024-04-09 Thread Jan Hubicka via Gcc-bugs

There is still problem with loop bounds. I am testing patch on that and then we should be (finally) finally safe.

Re: [Bug libstdc++/109442] Dead local copy of std::vector not removed from function

2024-05-14 Thread Jan Hubicka via Gcc-bugs

This patch attempts to add __builtin_operator_new/delete. So far they are not optimized, which will need to be done by extra flag of BUILT_IN_ code. also the decl.cc code can be refactored to be less of cut&paste and I guess has_builtin hack to return proper value needs to be moved to C++ FE. How

Re: [Bug c++/110137] implement clang -fassume-sane-operator-new

2024-06-04 Thread Jan Hubicka via Gcc-bugs

> Is the option supposed to be only about the standard global scope operator > new/delete (_Znam etc.) or also user operator new/delete class methods? If > the > former, then I agree it is a global property (or at least a per shared > library/binary property, one can arrange stuff with symbol vis

Re: [Bug libstdc++/110287] _M_check_len is expensive

2023-06-19 Thread Jan Hubicka via Gcc-bugs

> > There is no guarantee that std::vector::max_size() is PTRDIFF_MAX. It > depends on the Allocator type, A. A user-defined allocator could have > max_size() == 100. If inliner we see path to the throw functions, it will not determine _M_check_len as early inlinable. Perhaps we can __builtin_con

Re: [Bug ipa/110334] [13/14 Regresssion] unused functions not eliminated before LTO streaming

2023-06-23 Thread Jan Hubicka via Gcc-bugs

Just so it is somewhere, here is a testcase that we can't inline leaf functions to always_inlines unless we do some tracking of what calls were formerly indirect calls. We really overloaded always_inline from the original semantics "drop inlining heuristics" into "be sure that result is inlined" w

Re: [Bug ipa/110334] [13/14 Regresssion] unused functions not eliminated before LTO streaming

2023-06-28 Thread Jan Hubicka via Gcc-bugs

> > why disallow caller->indirect_calls? See testcase in comment #9 > > > + return false; > > + for (cgraph_edge *e2 = callee->callees; e2; e2 = e2->next_callee) > > I don't think this flys - it looks quadratic. Can we compute this > in the inline summary once instead? I guess I can

Re: [Bug target/113233] LoongArch: target options from LTO objects not respected during linking

2024-01-04 Thread Jan Hubicka via Gcc-bugs

> Confirm. But option save/restore has been always implemented: > > .section.gnu.lto_.opts,"",@progbits > .ascii "'-fno-openmp' '-fno-openacc' '-fno-pie' '-fcf-protection" > .ascii "=none' '-mabi=lp64d' '-march=loongarch64' '-mfpu=64' '-m" > .ascii "simd=lasx' '-mcmodel=nor

Re: [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option

2024-06-25 Thread Jan Hubicka via Gcc-bugs

> different issue from the one that is raised in the PR. (Unless we think that > -O2 and -O3 should always have the same inlining heuristics henceforward, but > that seems unlikely.) Yes, I think point of -O3 is to let compiler to be more aggressive than what seems desirable for your average dist

Re: [Bug libstdc++/87502] Poor code generation for std::string("c-style string")

2024-12-09 Thread Jan Hubicka via Gcc-bugs

> > So I think all we can hope for is merging memcpy with the extra write of 0. > > That's not actually clear. > > It would be reasonable to assume that foo isn't likely to change the string > and have the inlined destructor for a string that was initialized as a short > string like here do somet

Re: [Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-06-11 Thread Jan Hubicka via Gcc-bugs

> > as mentioned by Andrew, it is important to clone and also resolve indirect > > calls. Those auto-FDO 0 may prevent it from happening. > > It is easy to see in perf profile if the functions are cloned. > > > > My overall plan is to combine autofdo with guessed profile, when autofdo > > samples

51 matches

Mail list logo