[Bug tree-optimization/119860] New: needless vector unrolling causes less profitable vectorization

2025-04-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Blocks: 53947, 115130 Target Milestone: --- consider the following loop: #define N 512

[Bug testsuite/119286] [15 Regression] GCN vs. "middle-end: delay checking for alignment to load [PR118464]"

2025-04-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119286 --- Comment #9 from Tamar Christina --- (In reply to Thomas Schwinge from comment #8) > Tamar, thanks! I confirm all fixed -- but one: > > (In reply to myself from comment #1) > > ..., and similarly -- but not identical! -- for '-march=gfx1100

[Bug tree-optimization/119858] [15/16 Regression] GCN vs. "middle-end: Fix incorrect codegen with PFA and VLS [PR119351]"

2025-04-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119858 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org

[Bug tree-optimization/119351] [14 Regression] Incorrect forall masking for AND reduction in early break

2025-04-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119351 Tamar Christina changed: What|Removed |Added Priority|P1 |P2 --- Comment #23 from Tamar Christi

[Bug tree-optimization/119351] [14 Regression] Incorrect forall masking for AND reduction in early break

2025-04-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119351 Tamar Christina changed: What|Removed |Added Target Milestone|15.0|14.3 Summary|[15 Regressio

[Bug testsuite/119286] [15 Regression] GCN vs. "middle-end: delay checking for alignment to load [PR118464]"

2025-04-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119286 Tamar Christina changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug tree-optimization/119351] [15 Regression] Wrong code in GROMACS for AArch64 generic SVE VLS target since r15-6807-g68326d5d1a593d

2025-04-13 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119351 Tamar Christina changed: What|Removed |Added Keywords|needs-reduction,| |needs-source

[Bug tree-optimization/119351] [15 Regression] Wrong code in GROMACS for AArch64 generic SVE VLS target since r15-6807-g68326d5d1a593d

2025-04-10 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119351 --- Comment #18 from Tamar Christina --- (In reply to Richard Biener from comment #17) > I wonder if we can use > > BIT_FIELD_REF > > as the "reduction" step. Yeah that's the same comment Richard S suggested when we were talking to avoid th

[Bug tree-optimization/119351] [15 Regression] Wrong code in GROMACS for AArch64 generic SVE VLS target since r15-6807-g68326d5d1a593d

2025-04-10 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119351 --- Comment #16 from Tamar Christina --- Ok, found the bug and c-vise is running for a testcase. The issue is as follows: For early break we need to know which value to start the scalar loop with if we take an early exit. Historically this me

[Bug tree-optimization/119351] [15 Regression] Wrong code in GROMACS for AArch64 generic SVE VLS target since r15-6807-g68326d5d1a593d

2025-04-09 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119351 --- Comment #15 from Tamar Christina --- The following example reproduces the CFG but not the bad codegen: https://godbolt.org/z/Thzo7hz8P This generates the actual code I expected: _55 = {_2, _2, _2, _2}; _56 = {_11, _11, _11, _11}; _57

[Bug tree-optimization/119351] [15 Regression] Wrong code in GROMACS for AArch64 generic SVE VLS target since r15-6807-g68326d5d1a593d

2025-04-09 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119351 --- Comment #14 from Tamar Christina --- There seems to be an one error in the pre-header when calculating the initial vector IV. The starting values are calculated as: sub z27.s, z23.s, z31.s

[Bug tree-optimization/119187] vectorizer should be able to SLP already vectorized code

2025-04-08 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119187 --- Comment #8 from Tamar Christina --- (In reply to ktkachov from comment #7) > Could this be extended to scale Neon intrinsics code to SVE by > re-vectorising and treating the 128-bit Neon lane as a Q-word element of a > wider SVE vector? I t

[Bug target/113257] -march=native or -mcpu=native are ineffective, but -march=native -mcpu=native works on arm64 M2 Ultra

2025-04-08 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113257 Tamar Christina changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug middle-end/119577] RISC-V: Redundant vector IV roundtrip.

2025-04-07 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119577 --- Comment #2 from Tamar Christina --- (In reply to Richard Biener from comment #1) > IIRC it depends on the "kind" of early break whether we need the > first IV (scalar IV possible) or the last, but I don't rememeber exactly. First is always

[Bug tree-optimization/119351] [15 Regression] Wrong code in GROMACS for AArch64 generic SVE VLS target since r15-6807-g68326d5d1a593d

2025-04-07 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119351 --- Comment #13 from Tamar Christina --- Sorry had a week off, looking into this again today.

[Bug target/118892] [14 Regression] ICE (segfault) in rebuild_jump_labels on aarch64-linux-gnu since r14-5289

2025-04-07 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118892 --- Comment #18 from Tamar Christina --- (In reply to Pavol Rusnak from comment #17) > Is the fix going to be backported from master to 14.x release? Possibly > targeting 14.3.0 release? Yep

[Bug tree-optimization/119351] [15 Regression] Wrong code in GROMACS for AArch64 generic SVE VLS target

2025-03-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119351 --- Comment #9 from Tamar Christina --- --- static bool next_ci(int dimYY, int numCells, int nth, int ci_block, int* ci_x, int* ci_y, int* ci_b, int* ci) { while (*ci >= *ci_x * dimYY + *ci_y + 1) { *ci_y += 1; if (*ci_y

[Bug tree-optimization/115450] [15 Regression] cpu2017 502.gcc runtime miscompute on aarch64 with SVE since r15-1006-gd93353e6423eca

2025-03-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115450 --- Comment #11 from Tamar Christina --- (In reply to Richard Biener from comment #10) > Can anybody still reproduce this? I can't. I can reproduce the failure with the original commit but cannot with today's trunk.

[Bug tree-optimization/119351] [15 Regression] Wrong code in GROMACS for AArch64 generic SVE VLS target

2025-03-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119351 --- Comment #8 from Tamar Christina --- Looking at it some more, I think the loop is valid to vectorize. But we don't seem to vectorize the reduction jumping back to the outerloop: ;; basic block 384, loop depth 3, count 8598980 (estimated lo

[Bug tree-optimization/119402] [14/15 Regression] `((-bool) & _6) & (~_6)` is not optimized to 0 on some targets since r14-5673

2025-03-25 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119402 --- Comment #3 from Tamar Christina --- (In reply to Jakub Jelinek from comment #2) > Started with r14-5673-g33c2b70dbabc02788caabcbc66b7baeafeb95bcf > With -O2 -mtune=generic it is fine even on the current trunk. Seems like it's due to missing

[Bug target/119108] [15 Regression] AArch64 Commit 'vect: Force alignment peeling ...' (r15-6807-g68326d5d1a593d) causes regression in Snappy workload for -mcpu=neoverse-v2.

2025-03-25 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108 --- Comment #12 from Tamar Christina --- Sorry for the slow response, had a few days off. The regression here can be reproduced through this example loop: https://godbolt.org/z/jnGe5x4P7 for the current loop in snappy what you want is -UALIGNE

[Bug tree-optimization/119351] [15 Regression] Wrong code in GROMACS for AArch64 generic SVE VLS target

2025-03-25 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119351 --- Comment #7 from Tamar Christina --- Sorry for the delay, had a few days off. So looking at this again, it's happening When next_ci gets inlined into nbnxn_make_pairlist_part, the while loop while (next_ci(iGrid, nth, ci_block, &ci_x, &ci_y

[Bug tree-optimization/119393] [15 Regression] Worse vectorization of imagick_r hot loop on aarch64 since r15-5024-g2a2e6784074e1f

2025-03-20 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119393 --- Comment #3 from Tamar Christina --- Confirmed.

[Bug tree-optimization/119351] [15 Regression] Wrong code in GROMACS for AArch64 generic SVE VLS target

2025-03-20 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
|1 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org --- Comment #4 from Tamar Christina --- While looking at the codegen it looks like GROMACS has a lot of loops that get vectorized

[Bug tree-optimization/119351] [15 Regression] Wrong code in GROMACS for AArch64 generic SVE VLS target

2025-03-20 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119351 --- Comment #6 from Tamar Christina --- (In reply to ktkachov from comment #5) > (In reply to Tamar Christina from comment #4) > > While looking at the codegen it looks like GROMACS has a lot of loops that > > get vectorized now and it's showing

[Bug testsuite/119286] [15 Regression] GCN vs. "middle-end: delay checking for alignment to load [PR118464]"

2025-03-19 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119286 --- Comment #5 from Tamar Christina --- Still have one to fix.

[Bug target/115842] [15 Regression] 6.5% slowdown of 548.exchange2_r on Intel Ice Lake

2025-03-19 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115842 --- Comment #9 from Tamar Christina --- (In reply to Hongtao Liu from comment #8) > (In reply to Tamar Christina from comment #7) > > (In reply to Hongtao Liu from comment #6) > > > I noticed some double-counting of cost in group-candidate (reg

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2025-03-19 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #24 from Tamar Christina --- Hi, Yeah vectorization was one of the reasons for the slowdown. Do note however it's not entirely safe to backport that patch, as it exposes another bug which has a large fix. At least the top two comm

[Bug tree-optimization/119351] [15 Regression] Wrong code in GROMACS for AArch64 generic SVE VLS target

2025-03-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119351 --- Comment #3 from Tamar Christina --- Confirmed, able to reproduce it now. Taking a look. -march=armv8-a+sve is enough FFIW.

[Bug tree-optimization/119187] vectorizer should be able to SLP already vectorized code

2025-03-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119187 --- Comment #5 from Tamar Christina --- (In reply to Richard Biener from comment #4) > > for (...) >a[32*i] = ..; >a[32*i+1] = ..; > ... >a[32*i + 31] = ...; > > to match the number of lanes in a HW vector. It shares some of the

[Bug target/115842] [15 Regression] 6.5% slowdown of 548.exchange2_r on Intel Ice Lake

2025-03-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115842 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org

[Bug testsuite/119286] [15 Regression] GCN vs. "middle-end: delay checking for alignment to load [PR118464]"

2025-03-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119286 Tamar Christina changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed|

[Bug target/118974] Use SVE cbranch sequence for Neon modes when TARGET_SVE

2025-03-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118974 --- Comment #3 from Tamar Christina --- and using the SVE CC regs: .L6: ldr q30, [x2, x0] cmple p15.s, p7/z, z30.s, #0 b.none .L2

[Bug target/118974] Use SVE cbranch sequence for Neon modes when TARGET_SVE

2025-03-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118974 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org

[Bug target/119108] [15 Regression] AArch64 Commit 'vect: Force alignment peeling ...' (r15-6807-g68326d5d1a593d) causes regression in Snappy workload for -mcpu=neoverse-v2.

2025-03-11 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108 --- Comment #11 from Tamar Christina --- Actually I just realized that loop uses two pointers, and we can only peel for one unknown misalignment atm. This loop will instead be versioned, and because of the manual misalignment in the caller I don

[Bug target/119108] [15 Regression] AArch64 Commit 'vect: Force alignment peeling ...' (r15-6807-g68326d5d1a593d) causes regression in Snappy workload for -mcpu=neoverse-v2.

2025-03-11 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108 --- Comment #10 from Tamar Christina --- (In reply to Matthew Malcomson from comment #9) > (In reply to Tamar Christina from comment #8) > > Ok, so having looked at this I'm not sure the compiler is at fault here. > > > > Similar to the SVN cas

[Bug tree-optimization/119187] vectorizer should be able to SLP already vectorized code

2025-03-10 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119187 --- Comment #3 from Tamar Christina --- (In reply to Andrew Pinski from comment #2) > (In reply to Andrew Pinski from comment #1) > > There is another bug report for a similar thing but with SSE and AVX2. > > yes PR 95960. Ah yeah, I guess I w

[Bug target/119108] [15 Regression] AArch64 Commit 'vect: Force alignment peeling ...' (r15-6807-g68326d5d1a593d) causes regression in Snappy workload for -mcpu=neoverse-v2.

2025-03-10 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108 --- Comment #8 from Tamar Christina --- Ok, so having looked at this I'm not sure the compiler is at fault here. Similar to the SVN case the snappy code is misaligning the loads intentionally and loading 64-bits at a time from the 8-bit pointe

[Bug tree-optimization/119187] New: vectorizer should be able to SLP already vectorized code

2025-03-10 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- Today there's a lot of code written as intrinsics for older microarchitectures

[Bug tree-optimization/118464] [15 Regression] gcc-15.0.0_pre20250112 ICE with opencv-4.10.0 using -O2/-ftree-loop-vectorize: memory_descriptor_ref.cpp:94:19: internal compiler error: in exact_div, at

2025-03-07 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118464 Tamar Christina changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug tree-optimization/116855] [14 Regression] Unsafe early-break vectorization

2025-03-07 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116855 Tamar Christina changed: What|Removed |Added Summary|[14/15 Regression] Unsafe |[14 Regression] Unsafe

[Bug middle-end/119145] [15 Regression] ICE in expanding IFN_MASK_CALL from vector math

2025-03-06 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119145 --- Comment #1 from Tamar Christina --- The vectorizer seems confused. Vectorization fails, but seems to fail during SLP transform so the ifc loop is kept, but the statements not transformed. it then produces broken SSA: note: * Analysis

[Bug middle-end/119145] New: [15 Regression] ICE in expanding IFN_MASK_CALL from vector math

2025-03-06 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
-code Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- Target: aarch64* The following testcase: typedef short Quantum; Quantum

[Bug target/119108] [15 Regression] AArch64 Commit 'vect: Force alignment peeling ...' (r15-6807-g68326d5d1a593d) causes regression in Snappy workload for -mcpu=neoverse-v2.

2025-03-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108 --- Comment #6 from Tamar Christina --- Ok, now really confirmed :) Interestingly the behavior on other uarches suggests this may be cost modelling. On Neoverse-V1 we get (without LTO): BM_UFlat/0/1 -4.60251 BM_UFlat/0/2 -2.34742 BM_UFlat/3/1

[Bug target/119108] [15 Regression] AArch64 Commit 'vect: Force alignment peeling ...' (r15-6807-g68326d5d1a593d) causes regression in Snappy workload for -mcpu=neoverse-v2.

2025-03-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108 --- Comment #5 from Tamar Christina --- Ah... It looks like somehow the built for /data/gcc/gcc-with-68326d5d1a5-install/ failed and it was silently picking up the distro compiler instead. Hence the difference in memmove only! I'll clean every

[Bug target/119108] [15 Regression] AArch64 Commit 'vect: Force alignment peeling ...' (r15-6807-g68326d5d1a593d) causes regression in Snappy workload for -mcpu=neoverse-v2.

2025-03-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108 --- Comment #4 from Tamar Christina --- (In reply to Matthew Malcomson from comment #3) > I only looked into VecSource/5/2, and unfortunately I looked into it on an > internal setup that compiles slightly differently. > > In that slightly diffe

[Bug target/119108] [15 Regression] AArch64 Commit 'vect: Force alignment peeling ...' (r15-6807-g68326d5d1a593d) causes regression in Snappy workload for -mcpu=neoverse-v2.

2025-03-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
||2025-03-05 Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Tamar Christina --- Confirmed. The only early break vectorization is in the reporting harness in benchmark

[Bug target/118892] [14/15 Regression] ICE (segfault) in rebuild_jump_labels on aarch64-linux-gnu since r14-5289

2025-02-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118892 --- Comment #13 from Tamar Christina --- (In reply to Jakub Jelinek from comment #12) > E.g. the i386 backend usually uses force_reg in this case. If the operand > is a REG, it does nothing, if it is a SUBREG, it is forced into a temporary > an

[Bug rtl-optimization/119046] [15 Regression] Performance drop from not forming lane-wise FMLAs with Eigen library

2025-02-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
||tnfchris at gcc dot gnu.org Ever confirmed|0 |1 Last reconfirmed||2025-02-27 Status|UNCONFIRMED |NEW --- Comment #1 from Tamar Christina --- The late-combine pass was supposed to handle

[Bug tree-optimization/119016] [15 regression] svn miscompiled with -O2 -mavx -fno-vect-cost-model since r15-6807-g68326d5d1a593d

2025-02-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119016 --- Comment #10 from Tamar Christina --- (In reply to rguent...@suse.de from comment #9) > On Wed, 26 Feb 2025, tnfchris at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119016 > > > > -

[Bug tree-optimization/119016] [15 regression] svn miscompiled with -O2 -mavx -fno-vect-cost-model since r15-6807-g68326d5d1a593d

2025-02-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119016 --- Comment #8 from Tamar Christina --- (In reply to rguent...@suse.de from comment #7) > On Wed, 26 Feb 2025, tnfchris at gcc dot gnu.org wrote: > > > Because of the scalar code doing DI mode loads, and the misalignment being &g

[Bug tree-optimization/119016] [15 regression] svn miscompiled with -O2 -mavx -fno-vect-cost-model since r15-6807-g68326d5d1a593d

2025-02-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119016 --- Comment #6 from Tamar Christina --- At the start of the second iteration len = 2, so start becomes misaligned at 0x7fffe2f2 but the peeling iteration code checks (0x7fffe2f2 / 8) & 1 which is 0, so it doesn't peel to align it. Inde

[Bug tree-optimization/119016] [15 regression] svn miscompiled with -O2 -mavx -fno-vect-cost-model since r15-6807-g68326d5d1a593d

2025-02-25 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119016 Tamar Christina changed: What|Removed |Added Priority|P3 |P1 Last reconfirmed|

[Bug tree-optimization/118976] [12/13/14/15 regression] Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors but compiled with -mcpu=neoverse-v1 (which is only f

2025-02-24 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
|tree-optimization Last reconfirmed||2025-02-24 CC||tnfchris at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #11 from Tamar Christina --- Confirmed. As Kyrill mentioned

[Bug target/118974] Use SVE cbranch sequence for Neon modes when TARGET_SVE

2025-02-21 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118974 Tamar Christina changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED

[Bug target/118942] New: [14/15 Regression] vld1q_s{8, 16}_x{3, 4} use incorrect pointer type

2025-02-19 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
-valid Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- Target: arm* The follow intrinsics incorrectly take a pointer to unsigned rather

[Bug target/118892] [14/15 Regression] ICE (segfault) in rebuild_jump_labels on aarch64-linux-gnu since r14-5289

2025-02-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118892 --- Comment #11 from Tamar Christina --- (In reply to Richard Sandiford from comment #10) > (In reply to Tamar Christina from comment #9) > > I swear that was something that was fixed. But in any case, the simplest > > fix is to force it into a

[Bug target/118892] [14/15 Regression] ICE (segfault) in rebuild_jump_labels on aarch64-linux-gnu since r14-5289

2025-02-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118892 --- Comment #9 from Tamar Christina --- (In reply to Andrew Pinski from comment #8) > (In reply to Tamar Christina from comment #7) > > > > But operand1 is marked as `register_operand` which means whatever did the > > expansion didn't honor the

[Bug c++/118921] New: C++ frontend does not honor GCC pragma optimize

2025-02-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
: c++ Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- The following example: # pragma STDC FP_CONTRACT ON # if __GNUC__ >= 4 # pragma GCC optimize ("no-fast-math,fp-contract=on") # endif # ifdef __FAST_MATH__ #

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2025-02-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 Bug 26163 depends on bug 117270, which changed state. Bug 117270 Summary: [15 Regression] 9% exec time slowdown of 538.imagick_r on aarch64 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117270 What|Removed |Adde

[Bug target/117270] [15 Regression] 9% exec time slowdown of 538.imagick_r on aarch64

2025-02-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117270 Tamar Christina changed: What|Removed |Added Resolution|FIXED |--- Status|RESOLVED

[Bug target/118892] [14/15 Regression] ICE (segfault) in rebuild_jump_labels on aarch64-linux-gnu since r14-5289

2025-02-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118892 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2025-02-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 Bug 26163 depends on bug 118691, which changed state. Bug 118691 Summary: [15 Regression] gcc_r in SPECCPU 2017 miscompare on train dataset https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118691 What|Removed |Adde

[Bug middle-end/118691] [15 Regression] gcc_r in SPECCPU 2017 miscompare on train dataset

2025-02-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118691 Tamar Christina changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug tree-optimization/118464] [15 Regression] gcc-15.0.0_pre20250112 ICE with opencv-4.10.0 using -O2/-ftree-loop-vectorize: memory_descriptor_ref.cpp:94:19: internal compiler error: in exact_div, at

2025-02-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118464 --- Comment #14 from Tamar Christina --- Still being worked on, I'll send v3 of the patch today or tomorrow.

[Bug rtl-optimization/118611] LRA inserts unneeded reload on FMA chain

2025-02-13 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118611 --- Comment #8 from Tamar Christina --- Yeah, that makes sense. Thanks for working on it! We've been trying to reduce the different cases where we see this happening in the hopes to provide more data to tune any possible heuristics. So the pa

[Bug rtl-optimization/118611] LRA inserts unneeded reload on FMA chain

2025-02-13 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118611 Tamar Christina changed: What|Removed |Added CC||acoplan at gcc dot gnu.org --- Commen

[Bug tree-optimization/118852] [15 regression] Train run of 502.gcc_r compiled with -Ofast -fprofile-generate -march=x86_64-v3 fails

2025-02-13 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118852 --- Comment #4 from Tamar Christina --- (In reply to ktkachov from comment #3) > FWIW I see this also on aarch64 I filed the AArch64 bug weeks ago https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118691, there we don't need -fprofile-generate to tr

[Bug target/118800] [13 regression] aarch64 -mcpu=native ICEs since PR113257 backport

2025-02-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118800 Tamar Christina changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug tree-optimization/118211] tree-vectorize: vectorize input.cc, find_end_of_line

2025-02-10 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118211 Bug 118211 depends on bug 118754, which changed state. Bug 118754 Summary: [15 Regression] FAIL: gcc.target/i386/pr106010-8c.c by r15-6807-g68326d5d1a593d https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118754 What|Removed

[Bug target/118753] [15 Regression] [meta-bug] GCC 15 Regression on x86

2025-02-10 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118753 Bug 118753 depends on bug 118754, which changed state. Bug 118754 Summary: [15 Regression] FAIL: gcc.target/i386/pr106010-8c.c by r15-6807-g68326d5d1a593d https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118754 What|Removed

[Bug testsuite/118754] [15 Regression] FAIL: gcc.target/i386/pr106010-8c.c by r15-6807-g68326d5d1a593d

2025-02-10 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118754 Tamar Christina changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/118800] [13 regression] aarch64 -mcpu=native ICEs since PR113257 backport

2025-02-08 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
||tnfchris at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org Last reconfirmed||2025-02-08 Status|UNCONFIRMED |ASSIGNED --- Comment #1 from Tamar Christina --- Arg, wonder

[Bug tree-optimization/118756] tree-ssa-loop-ivopts.cc:1156: Function defined but not used

2025-02-06 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118756 Tamar Christina changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug tree-optimization/118754] [15 Regression] FAIL: gcc.target/i386/pr106010-8c.c by r15-6807-g68326d5d1a593d

2025-02-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118754 --- Comment #3 from Tamar Christina --- As for vect-tail-nomask-1.c and pr106010-8c.c they are testisms that I had fixed but it seems like I never updated the final patch with. The result checking loops just need a #pragma GCC novector. Will s

[Bug c/118756] tree-ssa-loop-ivopts.cc:1156: Function defined but not used

2025-02-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118756 --- Comment #2 from Tamar Christina --- Ah, indeed it's unused now. I'll send a cleanup patch then. Thanks for catching it!

[Bug tree-optimization/118754] [15 Regression] FAIL: gcc.target/i386/pr106010-8c.c by r15-6807-g68326d5d1a593d

2025-02-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118754 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org

[Bug middle-end/118691] [15 Regression] gcc_r in SPECCPU 2017 miscompare on train dataset

2025-02-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118691 --- Comment #7 from Tamar Christina --- (In reply to Richard Biener from comment #6) > Works for me on x86_64-linux with -Ofast -march=znver4 Yeah still failing here. I'll track down the change in code this week. It's on my list for the week.

[Bug tree-optimization/118727] [15 Regression] gcc.dg/pr108692.c fails on LoongArch

2025-02-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118727 --- Comment #17 from Tamar Christina --- (In reply to Xi Ruoyao from comment #16) > (In reply to Tamar Christina from comment #15) > > (In reply to Xi Ruoyao from comment #13) > > > For example for the original gcc.dg/pr108692.c: > > > > > >

[Bug tree-optimization/118727] [15 Regression] gcc.dg/pr108692.c fails on LoongArch

2025-02-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118727 --- Comment #15 from Tamar Christina --- (In reply to Xi Ruoyao from comment #13) > For example for the original gcc.dg/pr108692.c: > > a.0_4 = (unsigned char) a_14; > _5 = (int) a.0_4; > b.1_6 = (unsigned char) b_16; > _7 = (int) b.1_6

[Bug tree-optimization/118727] [15 Regression] gcc.dg/pr108692.c fails on LoongArch

2025-02-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118727 --- Comment #11 from Tamar Christina --- (In reply to Xi Ruoyao from comment #10) > The difference from AArch64 and LoongArch64 is AArch64 has WIDEN_ABD, and > (with GCC 14.2): > > t.c:10:17: note: abd pattern recognized: patt_29 = (int) patt

[Bug tree-optimization/118727] [15 Regression] gcc.dg/pr108692.c fails on LoongArch

2025-02-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118727 --- Comment #8 from Tamar Christina --- That change was made in g:aec90c8bf30cbd66e4febae2c78622dc217f3918, but no real explanation as to why. patt_40 = (signed char) a.0_4; patt_41 = SAD_EXPR ; would be ok if it was patt_41 = SAD_EXPR

[Bug middle-end/118691] [15 Regression] gcc_r in SPECCPU 2017 miscompare on train dataset

2025-01-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118691 Tamar Christina changed: What|Removed |Added Summary|[15 Regression] gcc_r in|[15 Regression] gcc_r in

[Bug middle-end/118691] [15 Regression] gcc_r in SPECCPU 2017 miscompare with PGO + LTO

2025-01-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118691 --- Comment #4 from Tamar Christina --- (In reply to Andrew Pinski from comment #3) > Isn't this a dup of bug 115450 ? Don't believe so. This is only showing up with PGO for me, but it's only during training, so I suspect -fprofile-generate is

[Bug target/113257] -march=native or -mcpu=native are ineffective, but -march=native -mcpu=native works on arm64 M2 Ultra

2025-01-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113257 --- Comment #14 from Tamar Christina --- Should be fixed now on trunk and GCC 14 and 13, leaving it open for Iain's patch introducing the cores in aarch64-cores.def which would give us the right architecture too. However this should unblock the

[Bug target/110901] -march does not override -mcpu in calls to assembler

2025-01-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110901 Tamar Christina changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug middle-end/118691] [15 Regression] gcc_r in SPECCPU 2017 miscompare with PGO + LTO

2025-01-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118691 --- Comment #2 from Tamar Christina --- (In reply to Richard Biener from comment #1) > Please add -fno-strict-aliasing and try again. Already on. Full options are: -fprofile-generate -mcpu=neoverse-v1 -Ofast -fomit-frame-pointer -flto=auto -g

[Bug middle-end/118691] New: [15 Regression] gcc_r in SPECCPU 2017 miscompare with PGO + LTO

2025-01-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- Target: aarch64* During the training part the run fails with: 200.c: In function

[Bug rtl-optimization/118611] LRA inserts unneeded reload on FMA chain

2025-01-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118611 --- Comment #3 from Tamar Christina --- (In reply to Andrew Pinski from comment #2) > (In reply to Andrew Pinski from comment #1) > > I think this is the same as PR 82237. > > Or at least related. I'm not sure, in this one the instructions hav

[Bug rtl-optimization/118611] New: LRA inserts unneeded reload on FMA chain

2025-01-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- Target: aarch64* The following example: #include float32x4_t bad (float32x4_t x, float32x4_t c0

[Bug tree-optimization/118273] [15 Regression] ICE when vectorizing uniform vector function

2025-01-21 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118273 Tamar Christina changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug testsuite/113425] gcc.dg/fold-copysign-1.c fails on arm since g:7cbe41d35e6

2025-01-20 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113425 --- Comment #6 from Tamar Christina --- (In reply to Torbjorn SVENSSON from comment #5) > @Tamar: You can see the same fails with 14.2.Rel1 that is available for > download from the Arm webpage. > > I see the following in my gcc.log for Cortex-

[Bug tree-optimization/118273] [15 Regression] ICE when vectorizing uniform vector function

2025-01-20 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118273 --- Comment #2 from Tamar Christina --- It seems that the nmasks is wrong here: unsigned nmasks = exact_div (ncopies * bestn->simdclone->simdlen, TYPE_VECTOR_SUBPARTS (vecty

[Bug tree-optimization/118529] [15 regression] ICE when building openssl-3.3.2 on sparc (in operator[], at vec.h:910)

2025-01-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118529 --- Comment #7 from Tamar Christina --- (In reply to rguent...@suse.de from comment #6) > On Fri, 17 Jan 2025, tnfchris at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118529 > > > > -

[Bug tree-optimization/118529] [15 regression] ICE when building openssl-3.3.2 on sparc (in operator[], at vec.h:910)

2025-01-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118529 --- Comment #5 from Tamar Christina --- (In reply to Richard Biener from comment #4) > Confirmed. > OTOH the initial choice of mask mode for the compare by the vectorizer > is a bit odd. We get there from vect_recog_bool_pattern handling > >

[Bug target/113257] -march=native or -mcpu=native are ineffective, but -march=native -mcpu=native works on arm64 M2 Ultra

2025-01-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113257 Tamar Christina changed: What|Removed |Added Version|14.0|13.0 --- Comment #11 from Tamar Chris

[Bug testsuite/118451] gcc.dg/vect/vect-switch-search-line-fast.c FAILs

2025-01-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118451 Tamar Christina changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug middle-end/118472] [15 Regression] ICE : tree check: expected none of vector_type, have vector_type in operand_equal_p, at fold-const.cc:3749

2025-01-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118472 Tamar Christina changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug middle-end/118472] [15 Regression] ICE : tree check: expected none of vector_type, have vector_type in operand_equal_p, at fold-const.cc:3749

2025-01-14 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118472 --- Comment #3 from Tamar Christina --- reducer: typedef int a; typedef struct { a b __attribute__((__vector_size__(8))) } c; typedef a d __attribute__((__vector_size__(8))); c e, f, g; d h, j; void k() { c l; l.b[1] = 0; c m = l;

[Bug middle-end/118472] [15 Regression] ICE : tree check: expected none of vector_type, have vector_type in operand_equal_p, at fold-const.cc:3749

2025-01-14 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org --- Comment #2 from Tamar Christina --- (In reply to Richard Biener from comment #1) > Confirmed with -O3 -fopenmp-simd. The operand_equal_p code isn't good: > > 3745 /* BIT_INSERT_EXPR has an implict opera

  1   2   3   4   5   6   7   8   9   10   >