[Bug target/121994] [16 Regression] 15% slowdown of 538.imagick_r and 6% slowdown of 454.calculix on AMD Zen2 since r16-3396-g9823624395a946

2025-09-18 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121994 --- Comment #2 from Hongtao Liu --- I guess it's related to register pressure and can be tuned by adjusting reduc_lat_mult_thr. I don't have Zen2 machine, so for simplity, I'll just disable unroll in vectorizer for Zen2.

[Bug c/121976] DFP expression evaluations not consistent

2025-09-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121976 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/121970] struct copy still use zmm even specify -mmove-max=256

2025-09-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121970 --- Comment #8 from Hongtao Liu --- (In reply to Hongtao Liu from comment #7) > (In reply to Hongtao Liu from comment #6) > > > > > > > > So this seems like a target issue. > > > > > > Ah, I see, thanks. > > > > > > H.J, I think we should rem

[Bug target/121970] struct copy still use zmm even specify -mmove-max=256

2025-09-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121970 --- Comment #7 from Hongtao Liu --- (In reply to Hongtao Liu from comment #6) > > > > > > So this seems like a target issue. > > > > Ah, I see, thanks. > > > > H.J, I think we should remove ix86_store_max from MOVE_MAX. > > It failed pieces-

[Bug target/121970] struct copy still use zmm even specify -mmove-max=256

2025-09-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121970 --- Comment #6 from Hongtao Liu --- > > > > So this seems like a target issue. > > Ah, I see, thanks. > > H.J, I think we should remove ix86_store_max from MOVE_MAX. It failed pieces-memset-46.c since m_align is decided by MOVE_MAX_PIECES

[Bug target/121970] struct copy still use zmm even specify -mmove-max=256

2025-09-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121970 --- Comment #5 from Hongtao Liu --- (In reply to Andrew Pinski from comment #4) > (In reply to Hongtao Liu from comment #1) > > Although the option is x86 specific, but I think the issue is middle-end, > > it's related how MOVE_MAX and STORE_MAX

[Bug middle-end/121970] struct copy still use zmm even specify -mmove-max=256

2025-09-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121970 --- Comment #2 from Hongtao Liu --- According to documents, MOVE_MAX is for memory copy, STORE_BY_PIECE is for store only Macro: MOVE_MAX_PIECES A C expression used by move_by_pieces to determine the largest unit a load or store used to copy me

[Bug middle-end/121970] struct copy still use zmm even specify -mmove-max=256

2025-09-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121970 --- Comment #1 from Hongtao Liu --- Although the option is x86 specific, but I think the issue is middle-end, it's related how MOVE_MAX and STORE_MAX_PIECES is used.

[Bug middle-end/121970] New: struct copy still use zmm even specify -mmove-max=256

2025-09-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- typedef struct { double ds[8]; }ds; extern void bar (ds* ); void foo (double* a, double* b, double* c, double* d, ds* __restrict e, int n) { ds tmp[2

[Bug target/121947] Improve X86_TUNE_DEST_FALSE_DEP_FOR_GLC implementation

2025-09-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121947 --- Comment #6 from Hongtao Liu --- (In reply to H.J. Lu from comment #5) > (In reply to Hongtao Liu from comment #4) > > (In reply to H.J. Lu from comment #3) > > > Created attachment 62385 [details] > > > A patch > > > > > > This is a test pa

[Bug target/121947] Improve X86_TUNE_DEST_FALSE_DEP_FOR_GLC implementation

2025-09-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121947 --- Comment #4 from Hongtao Liu --- (In reply to H.J. Lu from comment #3) > Created attachment 62385 [details] > A patch > > This is a test patch: > > 1. Move pass_x86_cse after pass_split_all_insns. > 2. Split 3 UNSPEC_INSN_FALSE_DEP patterns

[Bug target/121947] Improve X86_TUNE_DEST_FALSE_DEP_FOR_GLC implementation

2025-09-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121947 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/121699] [16 regression] ICE when building mesa-25.1.8 with -march=znver4 (RTL check: expected code 'const_int', have 'reg' in ix86_vgf2p8affine_shift_matrix)

2025-08-31 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121699 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug libgcc/120691] _Decimal128 arithmetic error under FE_UPWARD

2025-08-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120691 --- Comment #16 from Hongtao Liu --- (In reply to Eric Botcazou from comment #15) > Release branches are open for *regression* fixes only by default. Also reverted on releases branches.

[Bug libgcc/120691] _Decimal128 arithmetic error under FE_UPWARD

2025-08-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120691 Hongtao Liu changed: What|Removed |Added Last reconfirmed||2025-08-30 Status|RESOLVED

[Bug libgcc/121718] [16 Regression] _Decimal128 now requires -lm, fails to link gdb (mpfr is underlinked against fesetenv()) since r16-3448-g50064b2898edfb

2025-08-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121718 --- Comment #5 from Hongtao Liu --- Let's revert it first

[Bug libgcc/121718] [16 Regression] _Decimal128 now requires -lm, fails to link gdb (mpfr is underlinked against fesetenv()) since r16-3448-g50064b2898edfb

2025-08-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121718 --- Comment #4 from Hongtao Liu --- (In reply to Hongtao Liu from comment #3) > Created attachment 62233 [details] > use __builtin_fegetround. > > Does this help? No.

[Bug libgcc/121718] [16 Regression] _Decimal128 now requires -lm, fails to link gdb (mpfr is underlinked against fesetenv()) since r16-3448-g50064b2898edfb

2025-08-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121718 --- Comment #2 from Hongtao Liu --- So using __builtin_fegetround?

[Bug libgcc/121718] [16 Regression] _Decimal128 now requires -lm, fails to link gdb (mpfr is underlinked against fesetenv()) since r16-3448-g50064b2898edfb

2025-08-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121718 --- Comment #3 from Hongtao Liu --- Created attachment 62233 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62233&action=edit use __builtin_fegetround. Does this help?

[Bug target/121699] [16 regression] ICE when building mesa-25.1.8 with -march=znver4 (RTL check: expected code 'const_int', have 'reg' in ix86_vgf2p8affine_shift_matrix)

2025-08-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121699 --- Comment #5 from Hongtao Liu --- (In reply to Hongtao Liu from comment #4) > This fixs the ICE. > 1) Fix predicate of operands[3] in cond_ since only const_vec_dup_operand is excepted for masked operations, and pass real count

[Bug target/121699] [16 regression] ICE when building mesa-25.1.8 with -march=znver4 (RTL check: expected code 'const_int', have 'reg' in ix86_vgf2p8affine_shift_matrix)

2025-08-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121699 --- Comment #4 from Hongtao Liu --- This fixs the ICE. diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 175798cff69..5dbe444847f 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -

[Bug target/121699] [16 regression] ICE when building mesa-25.1.8 with -march=znver4 (RTL check: expected code 'const_int', have 'reg' in ix86_vgf2p8affine_shift_matrix)

2025-08-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121699 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug libgcc/120691] _Decimal128 arithmetic error under FE_UPWARD

2025-08-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120691 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug middle-end/121661] [13/14/15/16 Regression] miscompilation involving complex and nested functions since r13-1762-gf9d4c3b45c5ed5

2025-08-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121661 --- Comment #10 from Hongtao Liu --- (In reply to Andrew Pinski from comment #7) > (In reply to Hongtao Liu from comment #6) > > Looks correct in the gimple > > The bug only happens at -O0. At higher levels it is ok. https://godbolt.org/z/M979

[Bug tree-optimization/121662] Unnecessary data dependant branches with avx512 masks

2025-08-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121662 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug middle-end/121661] [13/14/15/16 Regression] miscompilation involving complex and nested functions since r13-1762-gf9d4c3b45c5ed5

2025-08-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121661 --- Comment #6 from Hongtao Liu --- Looks correct in the gimple int main (int argc, char * * D.3685) { [local count: 1073741824]: # DEBUG BEGIN_STMT # DEBUG val => __complex__ (1.0e+0, 0.0) # DEBUG INLINE_ENTRY fun1 __builtin_dwarf_c

[Bug target/121606] -march=native on AVX10.1 capable host warns about -mno-evex512

2025-08-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121606 --- Comment #7 from Hongtao Liu --- (In reply to rguent...@suse.de from comment #5) > On Wed, 20 Aug 2025, haochen.jiang at intel dot com wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121606 > > > > --- Comment #4 from Haochen Jian

[Bug target/121606] -march=native on AVX10.1 capable host warns about -mno-evex512

2025-08-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121606 --- Comment #9 from Hongtao Liu --- > > Are you using GCC16, and yes GCC16 is refactored with that logic. In GCC15, because avx10.1 was initially set to 256-bit by default, we wanted to prevent the mixed usage of avx512 and avx10.1, so we issu

[Bug target/121606] -march=native on AVX10.1 capable host warns about -mno-evex512

2025-08-19 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121606 --- Comment #2 from Hongtao Liu --- And zen5 is not AVX10.1 capable host.

[Bug target/121606] -march=native on AVX10.1 capable host warns about -mno-evex512

2025-08-19 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
, ||liuhongt at gcc dot gnu.org --- Comment #1 from Hongtao Liu --- Note mevex512 option is deprecated in GCC15 and removed in GCC16. It's because avx10.1 will enable avx512fp16, but there's no avx512fp16 on Zen5 machine, and march=native will be exp

[Bug target/121274] [14/15/16 Regression] xmm extraction from zmm vector emits unnecessary vpextrq/vpinsrq sequence

2025-07-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121274 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug target/121274] [14/15/16 Regression] xmm extraction from zmm vector emits unnecessary vpextrq/vpinsrq sequence

2025-07-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121274 --- Comment #5 from Hongtao Liu --- (In reply to Hongtao Liu from comment #4) > Probably caused by r14-1902-g96c3539f2a3813 > > - /* Special case TImode to V1TImode conversions, via V2DI. */ > - if (mode == V1TImode > + /* Special case TImo

[Bug target/121274] [14/15/16 Regression] xmm extraction from zmm vector emits unnecessary vpextrq/vpinsrq sequence

2025-07-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121274 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug tree-optimization/119876] suboptimal code for avx512 conditional move

2025-07-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119876 Hongtao Liu changed: What|Removed |Added Known to work||16.0 Resolution|---

[Bug target/120957] [16 Regression] 6% slowdown of 503.bwaves_r on Zen2 since r16-1647-gc06979ff957485

2025-07-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120957 --- Comment #5 from Hongtao Liu --- (In reply to Filip Kastl from comment #3) > I've bisected this on Zen2. It is possible that this is actually two > different slowdowns and only the Zen2 slowdown is caused by r16-1647. I'll > bisect on Zen3.

[Bug target/120957] [16 Regression] 6-9% slowdown of 503.bwaves_r on Zen{2,3} since r16-1647-gc06979ff957485

2025-07-07 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120957 --- Comment #2 from Hongtao Liu --- I've tested my commit(r16-1647) and the previous commit(r16-1646) with -march=native -Ofast on zen3 server, and didn't find any regression for 503.bwaves_r.(we don't have zen2 machine.)

[Bug tree-optimization/120907] New: vectorizer creates redundunt vec_perm_expr for reversed access of array

2025-06-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- cat test.c extern int c[32000], d[32000]; void s1113() { for (int i = 32000; i >= 0; i--) { c[i] =

[Bug tree-optimization/120906] New: vectorizer create redudant permutation for reversed access of array

2025-06-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- cat test.c extern int c[32000], d[32000]; void s1113() { for (int i = 32000; i >= 0; i--) { c[i] = d[i]

[Bug target/120895] AVX data types default alignment is not correct

2025-06-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120895 --- Comment #9 from Hongtao Liu --- (In reply to Sam James from comment #8) > It passes for me with -march=znver2. Hongtao, were you maybe testing with a > compiler with default `--with-arch=`? I'm using option -march=x86-64-v4(assume __m512 ne

[Bug target/120895] AVX data types default alignment is not correct

2025-06-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120895 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/120815] Update -mtune=intel for Diamond Rapids and Clearwater Forest

2025-06-24 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120815 --- Comment #2 from Hongtao Liu --- Maybe we should have something like mtune=intel_p and mtune=intel_e, P-core and E-core are quite different from each other, mtune=intel maybe not sufficient.

[Bug target/120799] Incorrect UBSan alignment requirements for _mm_storeh_pd() and _mm_storel_pd()

2025-06-24 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
||a/show_bug.cgi?id=84508 CC||liuhongt at gcc dot gnu.org --- Comment #4 from Hongtao Liu --- >From Intel intrinsic guide[1], there's explict "mem_addr does not need to be aligned on any particular

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2025-06-24 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 Bug 26163 depends on bug 115842, which changed state. Bug 115842 Summary: [15/16 Regression] 6.5% slowdown of 548.exchange2_r on Intel Ice Lake https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115842 What|Removed |

[Bug target/115842] [15/16 Regression] 6.5% slowdown of 548.exchange2_r on Intel Ice Lake

2025-06-24 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115842 Hongtao Liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/120697] [16 regression] Bootstrap fails in ix86_expand_prologue

2025-06-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120697 --- Comment #6 from Hongtao Liu --- (In reply to Hongtao Liu from comment #5) > 9380 gcc_assert (!crtl->shrink_wrapped_separate); > > It hits this assert which is added by the patch, maybe this assert is not > needed. I mean it's added by

[Bug target/120697] [16 regression] Bootstrap fails in ix86_expand_prologue

2025-06-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120697 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug middle-end/120694] endless compile in ranger at expand

2025-06-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120694 --- Comment #3 from Hongtao Liu --- (In reply to Sam James from comment #2) > Could you retry on trunk? This might be a dupe of bug 120661. > > *** This bug has been marked as a duplicate of bug 120661 *** Yes, it's fixed by r16-1550-g9244ea4b

[Bug middle-end/120694] endless compile in ranger at expand

2025-06-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120694 --- Comment #1 from Hongtao Liu --- stuck in the loop of ranger_cache::propagate_cache for niters.5_40

[Bug middle-end/120694] New: endless compile in ranger at expand

2025-06-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- cat test.c #include typedef struct Symmetry { int **GFSym; } SymmetryGHex; void *SetupGH (int convlevel, int maxdim, int numvars) { int i, j; SymmetryGHex *myGH

[Bug gcov-profile/118551] Autofdo regressed 538.imagick_r by ~10% with -march=x86-64-v3 -O2

2025-06-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118551 --- Comment #10 from Hongtao Liu --- (In reply to Jan Hubicka from comment #9) > I am happy it helps. I wonder if you can share details of your SPEC config. > I.e. how you call perf (do you specify count etc) and how you handle merging > of pro

[Bug gcov-profile/118551] Autofdo regressed 538.imagick_r by ~10% with -march=x86-64-v3 -O2

2025-06-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118551 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Known to work|

[Bug gcov-profile/118551] Autofdo regressed 538.imagick_r by ~10% with -march=x86-64-v3 -O2

2025-06-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118551 --- Comment #7 from Hongtao Liu --- Looks like it's fixed by r16-1521-g2ef043c5a05d99

[Bug target/115842] [15/16 Regression] 6.5% slowdown of 548.exchange2_r on Intel Ice Lake

2025-06-12 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115842 --- Comment #11 from Hongtao Liu --- (In reply to Tamar Christina from comment #9) > (In reply to Hongtao Liu from comment #8) > > (In reply to Tamar Christina from comment #7) > > > (In reply to Hongtao Liu from comment #6) > > > > I noticed s

[Bug middle-end/112824] Stack spills and vector splitting with vector builtins

2025-06-05 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112824 Hongtao Liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/71453] Spills to vector registers are sub-optimal.

2025-06-05 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
||7.1.0 Resolution|--- |FIXED CC||liuhongt at gcc dot gnu.org --- Comment #8 from Hongtao Liu --- I can't reproduce the issue with testcase in #c1 since gcc7.1. So closed as fixed.

[Bug tree-optimization/92492] AVX512: Missed vectorization opportunity

2025-06-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92492 Bug 92492 depends on bug 92658, which changed state. Bug 92658 Summary: x86 lacks vector extend / truncate https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92658 What|Removed |Added -

[Bug target/95764] Failure to optimize usage of _mm512_set1_epi32 to a single instruction

2025-06-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
||liuhongt at gcc dot gnu.org Status|NEW |RESOLVED --- Comment #3 from Hongtao Liu --- Broadcast from imm is on purpose. *** This bug has been marked as a duplicate of bug 87767 ***

[Bug target/87767] Missing AVX512 memory broadcast for constant vector

2025-06-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767 Hongtao Liu changed: What|Removed |Added CC||gabravier at gmail dot com --- Comment #23

[Bug target/94962] Suboptimal AVX2 code for _mm256_zextsi128_si256(_mm_set1_epi8(-1))

2025-06-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
||liuhongt at gcc dot gnu.org Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #11 from Hongtao Liu --- Fixed in GCC13.1

[Bug tree-optimization/92645] Hand written vector code is 450 times slower when compiled with GCC compared to Clang

2025-06-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92645 Bug 92645 depends on bug 92658, which changed state. Bug 92658 Summary: x86 lacks vector extend / truncate https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92658 What|Removed |Added -

[Bug target/92611] auto vectorization failed for type promotation

2025-06-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92611 Bug 92611 depends on bug 92658, which changed state. Bug 92658 Summary: x86 lacks vector extend / truncate https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92658 What|Removed |Added -

[Bug target/92658] x86 lacks vector extend / truncate

2025-06-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
|RESOLVED CC||liuhongt at gcc dot gnu.org Known to work||14.1.0 --- Comment #28 from Hongtao Liu --- Fixed in GCC14.1

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2025-06-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org Known to

[Bug target/58889] GCC 4.9 fails to compile certain functions with intrinsics with __attribute__((target))

2025-06-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
||liuhongt at gcc dot gnu.org --- Comment #4 from Hongtao Liu --- Fixed.

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2025-06-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 36844, which changed state. Bug 36844 Summary: Vectorizer doesn't support INT<->FP conversions with different size https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36844 What|Removed |Adde

[Bug tree-optimization/96654] Failure to optimize vectorized conversion to `int` with AVX

2025-06-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96654 Bug 96654 depends on bug 36844, which changed state. Bug 36844 Summary: Vectorizer doesn't support INT<->FP conversions with different size https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36844 What|Removed |Adde

[Bug tree-optimization/36844] Vectorizer doesn't support INT<->FP conversions with different size

2025-06-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36844 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org

[Bug target/82897] Unnecessary zero-extension when loading mask register from memory

2025-06-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 Hongtao Liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/82897] Unnecessary zero-extension when loading mask register from memory

2025-06-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 --- Comment #12 from Hongtao Liu --- (In reply to Andrew Pinski from comment #10) > Looks like this was fixed in GCC 15: > ``` > foo: > .LFB7284: > .cfi_startproc > vmovd %edi, %xmm2 > vmovdqa32 %zmm1, %zmm4 >

[Bug target/82897] Unnecessary zero-extension when loading mask register from memory

2025-06-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/103750] [i386] GCC schedules KMOV instructions that destroys performance in loop

2025-06-03 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
|--- |FIXED CC||liuhongt at gcc dot gnu.org --- Comment #19 from Hongtao Liu --- Looks like it's fixed by r16-170-ga670ebde399548. Now it generates decent code as "_Z8qustrchrPDsS_Ds": cmp rdi, rsi

[Bug testsuite/120457] gcc.dg/vect/pr79920.c fail starting with r16-924-g1bc5b47f5b06dc

2025-05-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120457 --- Comment #2 from Hongtao Liu --- (In reply to Hongtao Liu from comment #1) > double __attribute__((noinline,noclone)) > compute_integral (double w_1[18]) > { > double A = 0; > double t33[2][6] = {{0.0, 0.0, 0.0, 0.0, 0.0, 0.0}, >

[Bug testsuite/120457] gcc.dg/vect/pr79920.c fail starting with r16-924-g1bc5b47f5b06dc

2025-05-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120457 --- Comment #1 from Hongtao Liu --- double __attribute__((noinline,noclone)) compute_integral (double w_1[18]) { double A = 0; double t33[2][6] = {{0.0, 0.0, 0.0, 0.0, 0.0, 0.0}, {0.0, 0.0, 0.0, 0.0, 0.0, 0.0}}; double t43[2] = {0.

[Bug target/120428] [14/15/16 regression] Suboptimal autovec involving blocked permutation and std::copy

2025-05-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120428 --- Comment #16 from Hongtao Liu --- (In reply to Jonathan Wakely from comment #15) > (In reply to Hongtao Liu from comment #13) > > The inner loop is not completely unrolled since std::copy is lowered to > > __builtin_memmove instead of __built

[Bug target/120428] [14/15/16 regression] Suboptimal autovec involving blocked permutation and std::copy

2025-05-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120428 --- Comment #14 from Hongtao Liu --- (In reply to Hongtao Liu from comment #13) > > > > constexpr std::size_t ProcessChunkSize = BlockSize * OrderSize; > > > > std::array buffer{}; > > > > std::byte* const bytes = reinterpret_cast

[Bug target/120428] [14/15/16 regression] Suboptimal autovec involving blocked permutation and std::copy

2025-05-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120428 --- Comment #13 from Hongtao Liu --- > > constexpr std::size_t ProcessChunkSize = BlockSize * OrderSize; > > std::array buffer{}; > > std::byte* const bytes = reinterpret_cast(data); > > for (std::size_t i = 0; i < TotalSize

[Bug middle-end/112824] Stack spills and vector splitting with vector builtins

2025-05-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112824 --- Comment #11 from Hongtao Liu --- > > Add --param sra-max-scalarization-size-Ospeed=2048 will eliminate those > spills > > So for sra we can consider using MOVE_MAX * move_ratio as the size limit for > Ospeed which represents real backend

[Bug tree-optimization/119181] Missed vectorization due to imperfect SLP discovery for 2 grouped load with same base pointer (taken as 1 interleaved load)

2025-05-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119181 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2025-05-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 119181, which changed state. Bug 119181 Summary: Missed vectorization due to imperfect SLP discovery for 2 grouped load with same base pointer (taken as 1 interleaved load) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119

[Bug middle-end/120378] Support narrowing clip idiom

2025-05-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378 --- Comment #1 from Hongtao Liu --- > The ifcvt'ed code before vect is: > > _4 = *_3; > x.0_12 = (unsigned int) _4; > _38 = -x.0_12; > _15 = (int) _38; > _16 = _15 >> 31; > _29 = x.0_12 > 255; > _17 = _29 ? _16 : _4; > _18 = (u

[Bug middle-end/118994] GCC fails to optimize (a >> 1) + (b >> 1) + ((a | b) & 1) to PAVGB/PAVGW (or equivalent instruction)

2025-05-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118994 Hongtao Liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/120215] [16 Regression] FAIL: gcc.target/i386/pr78794.c scan-assembler pandn by r16-517-g993aa0bd28722c

2025-05-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120215 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/120215] [16 Regression] FAIL: gcc.target/i386/pr78794.c scan-assembler pandn by r16-517-g993aa0bd28722c

2025-05-13 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120215 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug middle-end/120184] --gc-section can't discard unused section due to fpatchable-function-entry ?

2025-05-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120184 Hongtao Liu changed: What|Removed |Added Resolution|FIXED |INVALID

[Bug middle-end/120184] --gc-section can't discard unused section due to fpatchable-function-entry ?

2025-05-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120184 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug middle-end/120184] New: --gc-section can't discard unused section due to fpatchable-function-entry ?

2025-05-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
erity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- cat test.c int foo1(void) { static int foo_1; return ++foo_1; } int foo2(void) { s

[Bug gcov-profile/118508] 10% performance drop when enabling autofdo for spec2017 554.roms_r

2025-05-07 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118508 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug gcov-profile/118581] auto_profile can't annotate bb with all debug_stmt which assigned value with constant

2025-04-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118581 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/119879] [16 Regression] FAIL: gcc.target/i386/avx512fp16-trunc-extendvnhf.c since r16-39

2025-04-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119879 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|16.0

[Bug target/119879] New: [r16-39 Regression] FAIL: gcc.target/i386/avx512fp16-trunc-extendvnhf.c

2025-04-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- [r16-39 Regression] FAIL: gcc.target/i386/avx512fp16-trunc-extendvnhf.c On Linux/x86_64

[Bug target/108134] x86 Operand Modifiers documentation issue

2025-04-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108134 Hongtao Liu changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug target/108134] x86 Operand Modifiers documentation issue

2025-04-13 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108134 --- Comment #4 from Hongtao Liu --- (In reply to Hongtao Liu from comment #3) > (In reply to sandra from comment #2) > > This was introduced by commit 0fec3f62b9bfc03e5088a09036791c2ac84fe0c8. I > > wondered if there might have been a patch hun

[Bug target/108134] x86 Operand Modifiers documentation issue

2025-04-11 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
at gcc dot gnu.org |liuhongt at gcc dot gnu.org --- Comment #3 from Hongtao Liu --- (In reply to sandra from comment #2) > This was introduced by commit 0fec3f62b9bfc03e5088a09036791c2ac84fe0c8. I > wondered if there might have been a patch hunk to update the example that > didn&

[Bug target/119617] ICE: in standard_sse_constant_opcode, at config/i386/i386.cc:5465 with -fzero-call-used-regs=all -mabi=ms -mavx512f -mno-evex512

2025-04-09 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119617 --- Comment #12 from Hongtao Liu --- Let's just fix it in GCC16, either solution is ugly.

[Bug gcov-profile/118551] Autofdo regressed 538.imagick_r by ~10% with -march=x86-64-v3 -O2

2025-04-09 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118551 --- Comment #6 from Hongtao Liu --- (In reply to Jan Hubicka from comment #5) > as discussed in PR111551 the SPEC train run does not include hottest loop of > MorphologyApply, so MeanShiftImage may have same issue and auto-fdo may be > kind of c

[Bug target/119617] ICE: in standard_sse_constant_opcode, at config/i386/i386.cc:5465 with -fzero-call-used-regs=all -mabi=ms -mavx512f -mno-evex512

2025-04-07 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119617 --- Comment #6 from Hongtao Liu --- (In reply to Haochen Jiang from comment #4) > (In reply to Hongtao Liu from comment #3) > > (In reply to Hongtao Liu from comment #2) > > > (In reply to Richard Biener from comment #1) > > > > I think we need

[Bug target/119617] ICE: in standard_sse_constant_opcode, at config/i386/i386.cc:5465 with -fzero-call-used-regs=all -mabi=ms -mavx512f -mno-evex512

2025-04-06 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119617 --- Comment #3 from Hongtao Liu --- (In reply to Hongtao Liu from comment #2) > (In reply to Richard Biener from comment #1) > > I think we need to disable the effect of -mno-evex512, looks like there's > > still traces of it left? > > Let's ha

[Bug target/119617] ICE: in standard_sse_constant_opcode, at config/i386/i386.cc:5465 with -fzero-call-used-regs=all -mabi=ms -mavx512f -mno-evex512

2025-04-06 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119617 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/102294] memset expansion is sometimes slow for small sizes

2025-04-06 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/119596] x86: too eager use of rep movsq/rep stosq for inlined ops

2025-04-06 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119596 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

  1   2   3   4   5   6   7   >