[Bug target/87832] AMD pipeline models are very costly size-wise

2022-11-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832 --- Comment #8 from Alexander Monakov --- (In reply to Jan Hubicka from comment #7) > > 53730 r btver2_fp_min_issue_delay > > 53760 r znver1_fp_transitions > > 93960 r bdver3_fp_transitions > > 106102 r lujiazui_core_check > > 106102 r lujiazui_c

[Bug tree-optimization/107715] TSVC s161 for double runs at zen4 30 times slower when vectorization is enabled

2022-11-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107715 --- Comment #3 from Alexander Monakov --- There's a forward dependency over 'c' (read of c[i] vs. write of c[i+1] with 'i' iterating forward), and the vectorized variant takes the hit on each iteration. How is a slowdown even surprising. For th

[Bug target/87832] AMD pipeline models are very costly size-wise

2022-11-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832 --- Comment #10 from Alexander Monakov --- (In reply to Jan Hubicka from comment #9) > Actually for older cores I think the manufacturers do not care much. I > still have a working Bulldozer machine and I can do some testing. > I think in Buldoz

[Bug middle-end/107719] 14% regression on TSVC s3113 on znve4 compared to GCC 7.5

2022-11-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107719 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug tree-optimization/107647] [12/13 Regression] GCC 12.2.0 may produce FMAs even with -ffp-contract=off

2022-11-17 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107647 --- Comment #15 from Alexander Monakov --- I'm confused about the first hunk in the attached patch: --- a/gcc/tree-vect-slp-patterns.cc +++ b/gcc/tree-vect-slp-patterns.cc @@ -1035,8 +1035,10 @@ complex_mul_pattern::matches (complex_operation_t

[Bug tree-optimization/97832] AoSoA complex caxpy-like loops: AVX2+FMA -Ofast 7 times slower than -O3

2022-11-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug middle-end/107879] [13 Regression] ffmpeg-4 test suite fails on FPU arithmetics

2022-11-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107879 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug tree-optimization/97832] AoSoA complex caxpy-like loops: AVX2+FMA -Ofast 7 times slower than -O3

2022-11-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832 --- Comment #21 from Alexander Monakov --- (In reply to Michael_S from comment #19) > > Also note that 'vfnmadd231pd 32(%rdx,%rax), %ymm3, %ymm0' would be > > 'unlaminated' (turned to 2 uops before renaming), so selecting independent > > IVs for

[Bug rtl-optimization/107772] function prologue generated even though it's only needed in an unlikely path

2022-11-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107772 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #24 from Alexander Monakov --- (In reply to Peter Cordes from comment #23) > But at least on Linux, I don't think there's a way for user-space to even > ask for a page of WT or WP memory (or UC or WC). Only WB memory is easily > ava

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #26 from Alexander Monakov --- Sure, the right course of action seems to be to simply document that atomic types and built-ins are meant to be used on "common" (writeback) memory, and no guarantees can be given otherwise, because it

[Bug middle-end/107905] 2x slowdown versus CLANG and ICL

2022-11-29 Thread amonakov at gcc dot gnu.org via Gcc-bugs
||amonakov at gcc dot gnu.org --- Comment #3 from Alexander Monakov --- LLVM does a better job at code layout, and massively wins on the amount of executed branches (in particular unconditional jumps). With -fdisable-rtl-bbro gcc achieves a similar performance.

[Bug driver/107787] -Werror=array-bounds=X does not work as expected

2022-11-30 Thread amonakov at gcc dot gnu.org via Gcc-bugs
||amonakov at gcc dot gnu.org Resolution|--- |FIXED --- Comment #3 from Alexander Monakov --- Fixed for gcc-13.

[Bug middle-end/107905] 2x slowdown versus CLANG and ICL

2022-11-30 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107905 --- Comment #5 from Alexander Monakov --- Not sure what you don't like about the inputs, they appear quite reasonable. Perhaps GCC's estimation of bb frequencies is off (with profile feedback we achieve good performance). Georgi: you'll likely

[Bug middle-end/107905] 2x slowdown versus CLANG and ICL

2022-11-30 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107905 --- Comment #6 from Alexander Monakov --- Let me add that Clang supports GCC's -fprofile-{generate,use} flags for compatibility as well.

[Bug tree-optimization/107879] [13 Regression] ffmpeg-4 test suite fails on FPU arithmetics

2022-12-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107879 --- Comment #10 from Alexander Monakov --- If anyone is confused like I was, the commit actually includes a testcase, but the addition is not mentioned in the Changelog. I was sure the server-side receive hook was supposed to reject such incompl

[Bug c/107971] linking an assembler object creates an executable stack

2022-12-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107971 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug c++/108008] Compiler mis-optimization with posix_memalign

2022-12-07 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108008 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/87832] AMD pipeline models are very costly size-wise

2022-12-07 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832 --- Comment #11 from Alexander Monakov --- Factoring out Lujiazui divider shrinks its tables by almost 20x: 3 r lujiazui_decoder_min_issue_delay 20 r lujiazui_decoder_transitions 32 r lujiazui_agu_min_issue_delay 126 r lujiazui_agu_transitions 3

[Bug tree-optimization/108008] [12 Regression] wrong code with -O3 and posix_memalign

2022-12-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108008 --- Comment #9 from Alexander Monakov --- I think this is tree-ldist placing memset(sameZ, 0, zPlaneCount) after the loop, overwriting conditional 'sameZ[i] = true' assignments that happen in the loop. For the smaller testcase from comment #6,

[Bug tree-optimization/108008] [12 Regression] wrong code with -O3 and posix_memalign

2022-12-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108008 --- Comment #10 from Alexander Monakov --- Looks similar to PR 107323, but needs explicit -ftree-loop-distribution to trigger.

[Bug tree-optimization/108076] [10/11/12/13 Regression] GCC with -O3 produces code which fails to link

2022-12-12 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108076 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug rtl-optimization/108117] Wrong instruction scheduling on value coming from abnormal SSA

2022-12-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108117 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug rtl-optimization/108117] Wrong instruction scheduling on value coming from abnormal SSA

2022-12-15 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108117 Alexander Monakov changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|INVA

[Bug rtl-optimization/108117] Wrong instruction scheduling on value coming from abnormal SSA

2022-12-15 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108117 --- Comment #9 from Alexander Monakov --- (In reply to Feng Xue from comment #8) > In another angle, because gcc already model control flow and SSA web for > setjmp/longjmp, explicit volatile specification is not really needed. That covers GIM

[Bug tree-optimization/108129] New: nop_atomic_bit_test_and_p is too bloated

2022-12-15 Thread amonakov at gcc dot gnu.org via Gcc-bugs
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- match.pd has multi-pattern matcher 'nop_atomic_bit_test_and_p'. It expands to ~38 KLOC in gimple-match.cc and ~350 KB in the compiled binary. There h

[Bug rtl-optimization/108117] Wrong instruction scheduling on value coming from abnormal SSA

2022-12-15 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108117 --- Comment #12 from Alexander Monakov --- Shouldn't there be another bug for the sched1 issue specifically? In absence of abnormal control flow, extending lifetimes of pseudos across calls is still likely to be a pessimization.

[Bug rtl-optimization/108117] Wrong instruction scheduling on value coming from abnormal SSA

2022-12-15 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108117 Alexander Monakov changed: What|Removed |Added Resolution|DUPLICATE |FIXED --- Comment #14 from Alexande

[Bug rtl-optimization/108117] Wrong instruction scheduling on value coming from abnormal SSA

2022-12-15 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108117 Alexander Monakov changed: What|Removed |Added Resolution|FIXED |DUPLICATE --- Comment #15 from Alex

[Bug rtl-optimization/57067] Missing control flow edges for setjmp/longjmp

2022-12-15 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57067 --- Comment #9 from Alexander Monakov --- *** Bug 108117 has been marked as a duplicate of this bug. ***

[Bug middle-end/108140] ICE expanding __rbit

2022-12-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108140 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug rtl-optimization/108117] Wrong instruction scheduling on value coming from abnormal SSA

2022-12-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108117 --- Comment #16 from Alexander Monakov --- Draft patch for the sched1 issue: https://inbox.sourceware.org/gcc-patches/cf62c3ec-0a9e-275e-5efa-2689ff1f0...@ispras.ru/T/#m95238afa0f92daa0ba7f8651741089e7cfc03481

[Bug middle-end/108209] New: goof in genmatch.cc:commutative_op

2022-12-23 Thread amonakov at gcc dot gnu.org via Gcc-bugs
Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- It pretends that define_operator_list is commutative when its first member is NOT commutative: if (user_id *uid = dyn_cast (id)) { int res = commutative_op (uid

[Bug middle-end/108209] goof in genmatch.cc:commutative_op

2022-12-23 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108209 --- Comment #1 from Alexander Monakov --- Keeping notes as I go... Duplicated checks for 'op0' in lower_for are duplicated.

[Bug target/108229] New: [13 Regression] unprofitable STV transform

2022-12-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
: target Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-* In the following example, STV is making a very unprofitable transformation on trunk, but not on gcc-12: #include #include struct b

[Bug target/108229] [13 Regression] unprofitable STV transform since r13-4873-g0b2c1369d035e928

2022-12-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108229 --- Comment #3 from Alexander Monakov --- Thank you! I considered this unprofitable for these reasons: 1. As you said, the code grows in size, but the speed benefit is not clear. 2. The transform converts load+add operations in a loop, and the

[Bug middle-end/108256] New: Missing integer overflow instrumentation when assignment LHS is narrow

2022-12-31 Thread amonakov at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- For unsigned short f(unsigned short x, unsigned short y) { return x * y; } unsigned short g(unsigned short x

[Bug target/108315] New: -mcpu=power10 changes ABI

2023-01-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- Target: powerpc64le-*-* Created attachment 54202 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54202&action=edit testcase At le

[Bug rtl-optimization/108318] Floating point calculation moved out of loop despite fesetround

2023-01-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108318 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem

2022-05-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- Created attachment 52933 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52933&action=edit testcase Hit

[Bug rtl-optimization/105513] New: [9/10/11/12/13 Regression] Unnecessary SSE spill

2022-05-07 Thread amonakov at gcc dot gnu.org via Gcc-bugs
: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-* i?86-*-* Minimized from PR 105504. Compile with -O2 -mtune=haswell -mavx (other

[Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem

2022-05-07 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504 --- Comment #5 from Alexander Monakov --- The strange xmm0 spill issue may affect more code, so I reported an isolated testcase: PR 105513 (regression vs. gcc-8, the complete testcase in this PR also does not spill with gcc-8).

[Bug target/105513] [9/10/11/12/13 Regression] Unnecessary SSE spill since r9-5748-g1d4b4f4979171ef0

2022-05-20 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105513 --- Comment #7 from Alexander Monakov --- The second sequence is 3 uops vs 1/2 (issued/executed) uops in first, and on Haswell and Skylake it ties up port 5 for two cycles. Unclear if you're microbenchmarking latency or throughput, but in any c

[Bug target/61810] init-regs.c papers over issues elsewhere

2022-05-20 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/105700] GCC miscompiles? wine when using -march=pentium-m

2022-05-23 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105700 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/105700] GCC miscompiles? wine when using -march=pentium-m

2022-05-23 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105700 --- Comment #5 from Alexander Monakov --- (In reply to Artem S. Tashkinov from comment #4) > > There should be a note in dmesg when a process segfaults outside of a > > debugger. If you run wine without gdb, and winedevice.exe crashes, is there

[Bug bootstrap/105688] Cannot build GCC 11.3 on Fedora 36

2022-05-23 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105688 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug c/105863] RFE: __attribute__((incbin("file"))) or __builtin_incbin("file")

2022-06-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug tree-optimization/106019] New: Surprising SLP failure on trivial code

2022-06-17 Thread amonakov at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- In the following code, 'f' is not SLP-vectorized, but 'g' is. From a brief look at slp2 dump, looks like

[Bug target/106277] missed-optimization: redundant movzx

2022-07-13 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106277 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug rtl-optimization/101347] [11/12/13 Regression] ICE in cfg_layout_initialize with __builtin_setjmp and -fprofile-generate -fprofile-use

2022-07-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101347 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug lto/91299] LTO inlines a weak definition in presence of a non-weak definition from an ELF file

2022-07-20 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91299 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug middle-end/80053] Label with address taken should prevent duplication of containing basic block

2021-07-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80053 Alexander Monakov changed: What|Removed |Added Last reconfirmed||2021-07-24 Resolution|INVALI

[Bug middle-end/80053] Label with address taken should prevent duplication of containing basic block

2021-07-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80053 Alexander Monakov changed: What|Removed |Added Resolution|INVALID |--- Status|RESOLVED

[Bug middle-end/80053] Label with address taken should prevent duplication of containing basic block

2021-07-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80053 --- Comment #13 from Alexander Monakov --- Yes, I'm talking only about labels which are potential branch targets, of course after the jumps have been DCE'd it is not really observable where the label points to. Unfortunately after four years I do

[Bug middle-end/80053] Label with address taken should prevent duplication of containing basic block

2021-07-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80053 --- Comment #15 from Alexander Monakov --- (In reply to Richard Biener from comment #14) > I think the original asm goto case clearly remains and this is a difficult > to handle case since the label address only appears as regular input and the >

[Bug ipa/113890] -fdump-tree-modref ICE with _BitInt

2024-02-12 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113890 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug rtl-optimization/113903] sched1 should schedule across EBBS

2024-02-13 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113903 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug c++/66487] sanitizer/warnings for lifetime DSE

2024-02-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66487 --- Comment #28 from Alexander Monakov --- The bug is about the issue of lacking diagnostics, it should be fine to make note of various approaches to remedy the problem in one bug report. (in any case, all discussion of the Valgrind-based approa

[Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%)

2024-03-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261 --- Comment #3 from Alexander Monakov --- The first attachment is empty (perhaps you made a non-recursive archive when you meant to recursively zip a directory).

[Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%)

2024-03-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261 Alexander Monakov changed: What|Removed |Added CC||mkuvyrkov at gcc dot gnu.org --- Co

[Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%) since r13-5154-g733a1b777f1

2024-03-13 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261 --- Comment #8 from Alexander Monakov --- If we want to get rid of the compilation time regression sooner rather than later, I can suggest limiting my change only to functions that call setjmp: diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc

[Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%) since r13-5154-g733a1b777f1

2024-03-13 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261 --- Comment #10 from Alexander Monakov --- Indeed, but OTOH according to bug 84402 comment 58 it caused a noticeable hit on gimple-match.cc compilation: 733a1b777f16cd397b43a242d9c31761f66d3da8 13th January 2023 sched-deps: do not schedule pseu

[Bug target/108866] Allow to pass Windows resource file (.rc) as input to gcc

2024-03-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108866 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug lto/114337] LTO symbol table doesn't include builtin functions

2024-03-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114337 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/110762] inappropriate use of SSE (or AVX) insns for v2sf mode operations

2023-07-21 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/110762] inappropriate use of SSE (or AVX) insns for v2sf mode operations

2023-07-21 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762 --- Comment #14 from Alexander Monakov --- That seems undesirable in light of comment #4, you'd risk creating a situation when -fno-trapping-math is unpredictably slower when denormals appear in dirty upper halves.

[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads

2023-07-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads

2023-07-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 --- Comment #9 from Alexander Monakov --- (In reply to Tom de Vries from comment #7) > Can you elaborate on what you consider a correct approach? I think this optimization is incorrect and should be active only under -Ofast. I can offer two ar

[Bug rtl-optimization/110823] [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils

2023-07-30 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads

2023-07-31 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 --- Comment #16 from Alexander Monakov --- In C11 and C++11 the issue of compiler-introduced racing loads is discussed as follows (5.1.2.4 Multi-threaded executions and data races in C11): 28 NOTE 14 Transformations that introduce a speculative

[Bug target/110202] _mm512_ternarylogic_epi64 generates unnecessary operations

2023-08-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202 Alexander Monakov changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/110926] [14 regression] Bootstrap failure (matmul_i1.c:1781:1: internal compiler error: RTL check: expected elt 0 type 'i' or 'n', have 'w' (rtx const_int) in vpternlog_redundant_operand_m

2023-08-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110926 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug other/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #8 from Alexander Monakov --- Why? There's no bswap here, in particular mbedtls_put_unaligned_uint64 is a straightforward wrapper for memcpy: inline void mbedtls_put_unaligned_uint64(void *p, uint64_t x) { memcpy(p, &x, sizeof(x

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #9 from Alexander Monakov --- (In reply to Alexander Monakov from comment #2) > Note that inline functions in mbedtls/library/alignment.h all miss the > 'static' qualifier, which affects inlining decisions, and looks like a > mistake

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #10 from Alexander Monakov --- Ah, the non-static inlines are intentional, the corresponding extern declarations appear in library/platform_util.c. Sorry, I missed that file the first time around.

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #11 from Alexander Monakov --- (In reply to Alexander Monakov from comment #8) > inline void mbedtls_put_unaligned_uint64(void *p, uint64_t x) > { > memcpy(p, &x, sizeof(x)); > } > > > We deciding to not inline this, while inli

[Bug target/110979] Miss-optimization for O2 fully masked loop on floating point reduction.

2023-08-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110979 --- Comment #2 from Alexander Monakov --- Yes, it is wrong-code to full extent. To demonstrate, you can initialize 'sum' and the array to negative zeroes: #define FLT double #define N 20 __attribute__((noipa)) FLT foo3 (FLT *a) { FLT sum =

[Bug middle-end/111009] [12/13/14 regression] -fno-strict-overflow erroneously elides null pointer checks and causes SIGSEGV on perf from linux-6.4.10

2023-08-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111009 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug rtl-optimization/111101] -finline-small-functions may invert FP arguments breaking FP bit accuracy in case of NaNs

2023-08-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
|--- |INVALID CC||amonakov at gcc dot gnu.org --- Comment #1 from Alexander Monakov --- 0x7fe5ed65 is a quiet NaN, not signaling (it differs from the input 0x7fa5ed65 sNaN by the leading mantissa bit 0x0040). IEEE-754 does not pin

[Bug rtl-optimization/111143] [missed optimization] unlikely code slows down diffutils x86-64 ASCII processing

2023-08-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug rtl-optimization/111143] [missed optimization] unlikely code slows down diffutils x86-64 ASCII processing

2023-08-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43 --- Comment #6 from Alexander Monakov --- Thanks. i5-1335U has two "performance cores" (with HT, four logical CPUs) and eight "efficiency cores". They have different micro-architecture. Are you binding the benchmark to some core in particular?

[Bug c/111210] Wrong code at -Os on x86_64-linux-gnu since r12-4849-gf19791565d7

2023-08-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
|UNCONFIRMED |RESOLVED CC||amonakov at gcc dot gnu.org --- Comment #1 from Alexander Monakov --- 'c' is called with 'd' pointing to 'long e[2]', so return *(int *)(d + 1); is an aliasing violation (dereferencin

[Bug c/111210] Wrong code at -Os on x86_64-linux-gnu since r12-4849-gf19791565d7

2023-08-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111210 --- Comment #4 from Alexander Monakov --- The testcase is small enough to notice the issue by inspection. Note that you get the "expected" answer with -fno-strict-aliasing, and as explained in https://gcc.gnu.org/bugs/ it is one of the things y

[Bug target/111655] [11/12/13/14 Regression] wrong code generated for __builtin_signbit and 0./0. on x86-64 -O2

2023-10-02 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111655 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug middle-end/51446] -fno-trapping-math generates NaN constant with different sign

2023-10-02 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51446 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug middle-end/111655] [11/12/13/14 Regression] wrong code generated for __builtin_signbit and 0./0. on x86-64 -O2

2023-10-04 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111655 --- Comment #11 from Alexander Monakov --- (In reply to Richard Biener from comment #10) > And this conservatively has to apply to all FP divisions where we might infer > "nonnegative" unless we can also infer !zerop? Yes, I think the logic in

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3

2023-10-04 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug tree-optimization/111694] [13/14 Regression] Wrong behavior for signbit of negative zero when optimizing

2023-10-04 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111694 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug middle-end/111701] New: [11/12/13/14 Regression] wrong code for __builtin_signbit(x*x)

2023-10-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org CC: amonakov at gcc dot gnu.org, eggert at cs dot ucla.edu, rguenth at gcc dot gnu.org

[Bug ipa/111643] __attribute__((flatten)) with -O1 runs out of memory (killed cc1)

2023-10-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111643 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug sanitizer/111736] Address sanitizer is not compatible with named address spaces

2023-10-09 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug sanitizer/111736] Address sanitizer is not compatible with named address spaces

2023-10-09 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736 --- Comment #3 from Alexander Monakov --- Sorry, the second half of my comment is confusing. To clarify, ASan works fine for TLS data (the compiler knows that TLS base is at fs:0; libsanitizer uses some hacks to initialize shadow for TLS anyway,

[Bug tree-optimization/111694] [13/14 Regression] Wrong behavior for signbit of negative zero when optimizing

2023-10-09 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111694 --- Comment #7 from Alexander Monakov --- No backport for gcc-13 planned?

[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly

2023-10-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768 --- Comment #5 from Alexander Monakov --- I think it's similar to attempting -march=native under distcc, which is already warned about on Gentoo wiki: https://wiki.gentoo.org/wiki/Distcc The difference here is that Intel so far decided to make

[Bug c/116458] [15 regression] New valgrind error in search_line_ssse3

2024-08-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116458 --- Comment #3 from Alexander Monakov --- David, thanks for Cc'ing me and for running Valgrind builds! Richi, I'll check in more detail later today, I think we should unbreak Valgrind builds ASAP by initializing padding under #ifdef ENABLE_VALG

[Bug preprocessor/116458] [15 regression] New valgrind error in search_line_ssse3

2024-08-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
|1 Last reconfirmed||2024-08-22 Assignee|unassigned at gcc dot gnu.org |amonakov at gcc dot gnu.org --- Comment #5 from Alexander Monakov --- Turns out we already initialize padding, just in a different file, and I completely

[Bug preprocessor/116458] [15 regression] New valgrind error in search_line_ssse3

2024-08-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116458 --- Comment #6 from Alexander Monakov --- As for Valgrind false positive, it handles this SSSE3 code really well and misses the key point by a very narrow margin. We have found = m1 + (m2 << 16); where both m1 and m2 hold 16-bit masks from p

[Bug preprocessor/116458] [15 regression] New valgrind error in search_line_ssse3

2024-08-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116458 --- Comment #8 from Alexander Monakov --- Thanks for the reference, but it doesn't help. Something more subtle is going on, because placing the shift-add combo in a separate function makes Valgrind properly compute known bits even without the ma

<    5   6   7   8   9   10   11   12   >