[Bug target/92295] Inefficient vector constructor

2019-11-07 Thread liuhongt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92295 --- Comment #2 from liuhongt at gcc dot gnu.org --- Author: liuhongt Date: Fri Nov 8 05:34:25 2019 New Revision: 277946 URL: https://gcc.gnu.org/viewcvs?rev=277946&root=gcc&view=rev Log: Fix inefficient vector constructor. Chang

[Bug target/92448] Confusing using of TARGET_PREFER_AVX128

2019-11-17 Thread liuhongt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92448 --- Comment #3 from liuhongt at gcc dot gnu.org --- Author: liuhongt Date: Mon Nov 18 02:22:55 2019 New Revision: 278385 URL: https://gcc.gnu.org/viewcvs?rev=278385&root=gcc&view=rev Log: Split X86_TUNE_AVX128_OPTI

[Bug target/92686] Inefficient mask operation for 128/256-bit vector VCOND_EXPR under avx512f

2019-12-08 Thread liuhongt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92686 --- Comment #5 from liuhongt at gcc dot gnu.org --- Author: liuhongt Date: Mon Dec 9 04:16:24 2019 New Revision: 279107 URL: https://gcc.gnu.org/viewcvs?rev=279107&root=gcc&view=rev Log: Enable mask movement for VCOND_EXPR under avx512f

[Bug target/92865] [10 Regression] error: unrecognizable insn: in extract_insn, at recog.c:2294 since r279107

2019-12-11 Thread liuhongt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92865 --- Comment #7 from liuhongt at gcc dot gnu.org --- Author: liuhongt Date: Wed Dec 11 08:06:06 2019 New Revision: 279214 URL: https://gcc.gnu.org/viewcvs?rev=279214&root=gcc&view=rev Log: Fix unrecognizable insn of pr92865. gcc/ P

[Bug target/92807] gcc generate extra move for the snippet code along with lea instruction.

2019-12-16 Thread liuhongt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92807 --- Comment #6 from liuhongt at gcc dot gnu.org --- Author: liuhongt Date: Tue Dec 17 01:29:09 2019 New Revision: 279451 URL: https://gcc.gnu.org/viewcvs?rev=279451&root=gcc&view=rev Log: Use add for a = a + b and a = b + a when possibl

[Bug target/92651] [10 Regression] Unnecessary stv transform in some x86 backend

2019-12-16 Thread liuhongt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92651 --- Comment #9 from liuhongt at gcc dot gnu.org --- Author: liuhongt Date: Tue Dec 17 01:50:35 2019 New Revision: 279452 URL: https://gcc.gnu.org/viewcvs?rev=279452&root=gcc&view=rev Log: Add abs pattern to handle {si,di} mode abs to av

[Bug target/89750] Wrong code for _mm_comi_round_ss

2019-06-02 Thread liuhongt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89750 --- Comment #3 from liuhongt at gcc dot gnu.org --- Author: liuhongt Date: Mon Jun 3 02:20:33 2019 New Revision: 271853 URL: https://gcc.gnu.org/viewcvs?rev=271853&root=gcc&view=rev Log: 2019-05-06 H.J. Lu Hon

[Bug target/86444] [X86] Implementation of SSE comi/ucomi intrinsics does not match recent versions of icc, clang, or MSVC

2019-06-02 Thread liuhongt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86444 --- Comment #2 from liuhongt at gcc dot gnu.org --- Author: liuhongt Date: Mon Jun 3 02:20:33 2019 New Revision: 271853 URL: https://gcc.gnu.org/viewcvs?rev=271853&root=gcc&view=rev Log: 2019-05-06 H.J. Lu Hon

[Bug target/89803] Missing AVX512 intrinsics

2019-06-04 Thread liuhongt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89803 --- Comment #7 from liuhongt at gcc dot gnu.org --- Author: liuhongt Date: Wed Jun 5 06:04:22 2019 New Revision: 271946 URL: https://gcc.gnu.org/viewcvs?rev=271946&root=gcc&view=rev Log: gcc/ 2019-06-05 Hongtao Liu PR targ

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-07 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576 --- Comment #37 from Hongtao Liu --- (In reply to Richard Biener from comment #36) > For example with AVX512VL and the following, using -O -fgimple -mavx512vl > we get simply > > notl%esi > orl %esi, %edi > cmpb

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-07 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576 --- Comment #38 from Hongtao Liu --- > I think we should also mask off the upper bits of variable mask? > > notl%esi > orl %esi, %edi > notl%edi > andl$15, %edi > je .L3 with -mbmi,

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-07 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576 --- Comment #39 from Hongtao Liu --- > > the question is whether that matches the semantics of GIMPLE (the padding > > is inverted, too), whether it invokes undefined behavior (don't do it - it > > seems for people using intrinsics that's what i

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576 --- Comment #43 from Hongtao Liu --- > Well, yes, the discussion in this bug was whether to do this at consumers > (that's sth new) or with all mask operations (that's how we handle > bit-precision integer operations, so it might be relatively

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576 --- Comment #44 from Hongtao Liu --- > > Note the AND is removed by combine if I add it: > > Successfully matched this instruction: > (set (reg:CCZ 17 flags) > (compare:CCZ (and:HI (not:HI (subreg:HI (reg:QI 102 [ tem_3 ]) 0)) >

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576 --- Comment #45 from Hongtao Liu --- > > There's do_store_flag to fixup for uses not in branches and > > do_compare_and_jump for conditional jumps. > > reasonable enough for me. I mean we only handle it at consumers where upper bits matters.

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576 --- Comment #57 from Hongtao Liu --- > For dg-do run testcases I really think we should avoid those -march= > options, because it means a lot of other stuff, BMI, LZCNT, ... Make sense.

[Bug tree-optimization/109885] gcc does not generate movmskps and testps instructions (clang does)

2024-02-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885 --- Comment #4 from Hongtao Liu --- int sum() { int ret = 0; for (int i=0; i<8; ++i) ret +=(0==v[i]); return ret; } int sum2() { int ret = 0; auto m = v==0; for (int i=0; i<8; ++i) ret += m[i]; return ret; } For sum, gcc t

[Bug target/114107] poor vectorization at -O3 when dealing with arrays of different multiplicity, good with -O2

2024-02-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114107 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/114107] poor vectorization at -O3 when dealing with arrays of different multiplicity, good with -O2

2024-02-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114107 --- Comment #8 from Hongtao Liu --- (In reply to Hongtao Liu from comment #7) > perm_cost is very low in x86 backend, and it maybe ok for 128-bit vectors, > pshufb/shufps are avaible for most cases. > But for 256/512-bit vectors, when the permua

[Bug target/114107] poor vectorization at -O3 when dealing with arrays of different multiplicity, good with -O2

2024-02-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114107 --- Comment #11 from Hongtao Liu --- (In reply to N Schaeffer from comment #9) > In addition, optimizing for size with -Os leads to a non-vectorized > double-loop (51 bytes) while the vectorized loop with vbroadcastsd (produced > by clang -Os) l

[Bug tree-optimization/112325] Missed vectorization of reduction after unrolling

2024-02-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325 --- Comment #9 from Hongtao Liu --- The original case is a little different from the one in PR. It comes from ggml #include #include typedef uint16_t ggml_fp16_t; static float table_f32_f16[1 << 16]; inline static float ggml_lookup_fp16_to_

[Bug tree-optimization/112325] Missed vectorization of reduction after unrolling

2024-02-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325 --- Comment #10 from Hongtao Liu --- (In reply to Hongtao Liu from comment #9) > The original case is a little different from the one in PR. But the issue is similar, after cunrolli, GCC failed to vectorize the outer loop. The interesting thing

[Bug tree-optimization/112325] Missed vectorization of reduction after unrolling

2024-02-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325 --- Comment #11 from Hongtao Liu --- >Loop body is likely going to simplify further, this is difficult >to guess, we just decrease the result by 1/3. */ > This is introduced by r0-68074-g91a01f21abfe19 /* Estimate number of insns of

[Bug target/114125] New: Support vcond_mask_qiqi and friends.

2024-02-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- Quote from https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646587.html > On Linux/x86

[Bug target/114125] Support vcond_mask_qiqi and friends.

2024-02-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114125 Hongtao Liu changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED

[Bug tree-optimization/112325] Missed vectorization of reduction after unrolling

2024-02-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325 --- Comment #14 from Hongtao Liu --- (In reply to rguent...@suse.de from comment #13) > On Tue, 27 Feb 2024, liuhongt at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325 > > > > --- Comm

[Bug tree-optimization/112325] Missed vectorization of reduction after unrolling

2024-02-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325 --- Comment #16 from Hongtao Liu --- > I'm all for removing the 1/3 for innermost loop handling (in cunroll > the unrolled loop is then innermost). I'm more concerned about > unrolling more than one level which is exactly what's required for >

[Bug tree-optimization/114164] simdclone vectorization creates unsupported IL

2024-02-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114164 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug d/114171] [13/14 Regression] gdc -O2 -mavx generates misaligned vmovdqa instruction

2024-02-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114171 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org Last

[Bug target/111822] [12/13/14 Regression] during RTL pass: lr_shrinkage ICE: in operator[], at vec.h:910 with -O2 -m32 -flive-range-shrinkage -fno-dce -fnon-call-exceptions since r12-5301-g04520645038

2024-03-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111822 --- Comment #16 from Hongtao Liu --- (In reply to Uroš Bizjak from comment #11) > (In reply to Richard Biener from comment #10) > > The easiest fix would be to refuse applying STV to a insn that > > can_throw_internal () (that's an insn that has

[Bug target/110027] [11/12/13/14 regression] Misaligned vector store on detect_stack_use_after_return

2024-03-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110027 --- Comment #12 from Hongtao Liu --- (In reply to Sam James from comment #11) > Calling it a 11..14 regression as we know 14 is bad and 7.5 is OK, but I > can't test 11/12 on an avx512 machine right now. I can't reproduce that with 11/12, but w

[Bug target/110027] [11/12/13/14 regression] Misaligned vector store on detect_stack_use_after_return

2024-03-11 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110027 --- Comment #13 from Hongtao Liu --- So the stack is like --- stack top -32 - (offset -32) -64 (32 bytes redzone) - (offset -64) -128 (64 bytes __m512) (offset -128) (32-bytes redzone) ---(offset -1

[Bug libgcc/111731] [13/14 regression] gcc_assert is hit at libgcc/unwind-dw2-fde.c#L291

2024-03-11 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111731 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/110027] [11/12/13/14 regression] Misaligned vector store on detect_stack_use_after_return

2024-03-11 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110027 --- Comment #14 from Hongtao Liu --- diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc index 0de299c62e3..92062378d8e 100644 --- a/gcc/cfgexpand.cc +++ b/gcc/cfgexpand.cc @@ -1214,7 +1214,7 @@ expand_stack_vars (bool (*pred) (size_t), class stack

[Bug target/111822] [12/13/14 Regression] during RTL pass: lr_shrinkage ICE: in operator[], at vec.h:910 with -O2 -m32 -flive-range-shrinkage -fno-dce -fnon-call-exceptions since r12-5301-g04520645038

2024-03-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111822 Hongtao Liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/110027] [11/12/13/14 regression] Misaligned vector store on detect_stack_use_after_return

2024-03-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110027 --- Comment #15 from Hongtao Liu --- A patch is posted at https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647604.html

[Bug target/114334] [14 Regression] ICE: in extract_insn, at recog.cc:2812 (unrecognizable insn and:HF?) with lroundf16() and -ffast-math -mavx512fp16

2024-03-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
||2024-03-15 CC||liuhongt at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Hongtao Liu --- Mine

[Bug tree-optimization/66862] OpenMP SIMD does not work (use SIMD instructions) on conditional code

2024-03-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66862 --- Comment #5 from Hongtao Liu --- > Now, it seems AVX512BW (and AVX512VL in some cases) has the needed > instructions, > in particular VMOVDQU{8,16}, but it is not reflected in maskload and > maskstore expanders. CCing Kyrill and Uros on this.

[Bug target/114334] [14 Regression] ICE: in extract_insn, at recog.cc:2812 (unrecognizable insn and:HF?) with lroundf16() and -ffast-math -mavx512fp16

2024-03-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114334 Hongtao Liu changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug middle-end/114347] wrong constant folding when casting __bf16 to int

2024-03-18 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114347 --- Comment #9 from Hongtao Liu --- (In reply to Richard Biener from comment #7) > (In reply to Jakub Jelinek from comment #6) > > You can use -fexcess-precision=16 if you don't want treating _Float16 and > > __bf16 as having excess precision.

[Bug tree-optimization/67683] Missed vectorization: shifts of an induction variable

2024-03-18 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67683 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug tree-optimization/114396] [13/14 Regression] Vector: Runtime mismatch at -O2 with -fwrapv since r13-7988-g82919cf4cb2321

2024-03-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 --- Comment #15 from Hongtao Liu --- (In reply to Richard Biener from comment #9) > (In reply to Robin Dapp from comment #8) > > No fallout on x86 or aarch64. > > > > Of course using false instead of TYPE_SIGN (utype) is also possible and > > m

[Bug tree-optimization/114396] [13/14 Regression] Vector: Runtime mismatch at -O2 with -fwrapv since r13-7988-g82919cf4cb2321

2024-03-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 Hongtao Liu changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #16 from Hongtao Liu

[Bug tree-optimization/114396] [13/14 Regression] Vector: Runtime mismatch at -O2 with -fwrapv since r13-7988-g82919cf4cb2321

2024-03-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 --- Comment #17 from Hongtao Liu --- > > > > The to_mpz args look like they could be mixing signs as well: > > I tries below, looks like mixing signs works well. debug show step_expr is -5 and signed. short a = 0xF; short b[16]; unsigned shor

[Bug rtl-optimization/92080] Missed CSE of _mm512_set1_epi8(c) with _mm256_set1_epi8(c)

2024-03-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92080 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug rtl-optimization/92080] Missed CSE of _mm512_set1_epi8(c) with _mm256_set1_epi8(c)

2024-03-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92080 --- Comment #9 from Hongtao Liu --- > If we were to expose that vpxor before postreload we'd likely CSE but > we have > > 5: xmm0:V4SI=const_vector > REG_EQUIV const_vector > 6: [`b']=xmm0:V4SI > 7: xmm0:V8HI=const_vector >

[Bug tree-optimization/114396] [13/14 Regression] Vector: Runtime mismatch at -O2 with -fwrapv since r13-7988-g82919cf4cb2321

2024-03-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 --- Comment #20 from Hongtao Liu --- (In reply to JuzheZhong from comment #19) > I think it's better to add pr114396.c into vect testsuite instead of x86 > target test since it's the bug not only happens on x86. Sure, there's no target specific

[Bug tree-optimization/114396] [13/14 Regression] Vector: Runtime mismatch at -O2 with -fwrapv since r13-7988-g82919cf4cb2321

2024-03-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 Hongtao Liu changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug target/114427] New: [x86] ec_pack_truncv8si/v4si can be optimized with pblendw instead of pand for AVX2 target

2024-03-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- void foo (int* a, short* __restrict b, int* c) { for (int i = 0; i != 8; i++) b[i] = c[i] + a[i

[Bug target/114428] New: [x86] psrad xmm, xmm, 16 and pand xmm, const_vector (0xffff x4) can be optimized to psrld

2024-03-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- typedef unsigned short uint16_t; typedef short int16_t; #define QUANT_ONE( coef, mf, f

[Bug target/114429] New: [x86] (neg a) ashifrt>> 31 can be optimized to a > 0.

2024-03-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
ority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- typedef unsigned char uint8_t; uint8_t x264_clip_uint8( int x ) { return x&(~255) ? (-x)>>31 : x; } void foo (int* a, int* __restrict b, in

[Bug target/114429] [x86] (neg a) ashifrt>> 31 can be optimized to a > 0.

2024-03-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114429 Hongtao Liu changed: What|Removed |Added Target||x86_64-*-* i?86-*-* --- Comment #1 from H

[Bug target/114429] [x86] (neg a) ashifrt>> 31 can be optimized to a > 0.

2024-03-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114429 --- Comment #2 from Hongtao Liu --- (In reply to Hongtao Liu from comment #1) > when x is INT_MIN, I assume -x is UD, so compiler can do anything. > otherwise, (-x) >> 31 is just x > 0. > From rtl view. neg of INT_MIN is assumed to 0 after it's

[Bug target/114429] [x86] (neg a) ashifrt>> 31 can be optimized to a > 0.

2024-03-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114429 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug tree-optimization/114471] [14 regression] ICE when building liblc3-1.0.4 with -fno-vect-cost-model -march=x86-64-v4

2024-03-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114471 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug tree-optimization/114471] [14 regression] ICE when building liblc3-1.0.4 with -fno-vect-cost-model -march=x86-64-v4

2024-03-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114471 --- Comment #6 from Hongtao Liu --- (In reply to Hongtao Liu from comment #5) > Maybe we should always use kmask under AVX512, currently only >= 128-bits > vector of vector _Float16 use kmask, < 128 bits vector still use vector mask. > and we n

[Bug target/114514] New: v16qi >> 7 can be optimized with vpcmpgtb

2024-03-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
ponent: target Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- v16qi foo2 (v16qi a, v16qi b) { return a >> 7; } it can be optimized with vpxor xmm1, xmm1, xmm1 vpcmpgtbxmm0, xmm1, xmm0 re

[Bug target/114514] v16qi >> 7 can be optimized with vpcmpgtb

2024-03-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114514 --- Comment #3 from Hongtao Liu --- (In reply to Andrew Pinski from comment #1) > Confirmed. > > Note non sign bit can be improved too: > ``` I assume you're talking about broadcast from imm or directly from constant pool. GCC chooses the forme

[Bug target/114544] New: [x86] stv should transform (subreg DI (V1TI) 8) as (vec_select:DI (V2DI) (const_int 1))

2024-04-01 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- typedef __uint128_t v128_t __attribute__((vector_size(16))); v128_t c; v128_t foo1 (v128_t *a, v128_t *b

[Bug target/114544] [x86] stv should transform (subreg DI (V1TI) 8) as (vec_select:DI (V2DI) (const_int 1))

2024-04-01 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114544 --- Comment #1 from Hongtao Liu --- 20590;; Turn SImode or DImode extraction from arbitrary SSE/AVX/AVX512F 20591;; vector modes into vec_extract*. 20592(define_split 20593 [(set (match_operand:SWI48x 0 "nonimmediate_operand") 20594(sub

[Bug target/114544] [x86] stv should transform (subreg DI (V1TI) 8) as (vec_select:DI (V2DI) (const_int 1))

2024-04-01 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114544 --- Comment #2 from Hongtao Liu --- Also for void foo2 (v128_t* a, v128_t* b) { c = (*a & *b)+ *b; } (insn 9 8 10 2 (set (reg:V1TI 108 [ _3 ]) (and:V1TI (reg:V1TI 99 [ _2 ]) (mem:V1TI (reg:DI 113) [1 *a_6(D)+0 S16 A128])

[Bug rtl-optimization/114556] New: weird loop unrolling when there's attribute aligned in side the loop

2024-04-02 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
ormal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- v32qi z (void* pa, void* pb, void* pc) { v32qi __attribute__((aligned(64))) a; v32qi __attrib

[Bug target/114570] New: GCC doesn't perform good loop invariant code motion for very long vector operations.

2024-04-03 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
IRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- typedef float v128_32 __attribute__((vector_size (128 * 4), aligned(2048))); v128_32 foo (v128_32 a, v128

[Bug target/113744] Unnecessary "m" constraint in *adddi_4

2024-07-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113744 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at

[Bug target/116157] AVX2 _mm256_exp_ps function is missing in the compiler

2024-07-31 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116157 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org

[Bug target/85236] missing _mm256_atan2_ps

2024-07-31 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85236 Hongtao Liu changed: What|Removed |Added CC||binklings at 163 dot com --- Comment #8 fr

[Bug target/116122] [14/15 regression] __FLT16_MAX__ is defined even with -mno-sse2 on 32-bit x86

2024-07-31 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116122 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug target/115981] [14/15 Regression] Redundant vmovaps to itself after vmovups since r14-537

2024-07-31 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115981 --- Comment #4 from Hongtao Liu --- (In reply to Jakub Jelinek from comment #3) > Created attachment 58786 [details] > gcc15-pr115981.patch > > Untested fix. As since that commit it checks swap_commutative_operands_p: > 1) CONST_VECTOR I think

[Bug target/113744] Unnecessary "m" constraint in *adddi_4

2024-07-31 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113744 Hongtao Liu changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug tree-optimization/89749] Very odd vector constructor

2024-07-31 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
||12.1.0 CC||liuhongt at gcc dot gnu.org Status|NEW |RESOLVED --- Comment #6 from Hongtao Liu --- Fixed in GCC12 and above.

[Bug rtl-optimization/116096] [15 Regression] during RTL pass: cprop_hardreg ICE: in extract_insn, at recog.cc:2848 (unrecognizable insn ashift:TI?) with -O2 -flive-range-shrinkage -fno-peephole2 -mst

2024-08-01 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116096 Hongtao Liu changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug rtl-optimization/115021] [14 regression] unnecessary spill for vpternlog

2024-08-01 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021 Hongtao Liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/116274] [14/15 Regression] x86: poor code generation with 16 byte function arguments and addition

2024-08-12 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116274 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/116274] [14/15 Regression] x86: poor code generation with 16 byte function arguments and addition

2024-08-12 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116274 --- Comment #5 from Hongtao Liu --- For non-avx case, looks like it hits here 748 /* Special case TImode to 128-bit vector conversions via V2DI. */ 749 if (VECTOR_MODE_P (mode) 75

[Bug target/116274] [14/15 Regression] x86: poor code generation with 16 byte function arguments and addition

2024-08-12 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116274 --- Comment #6 from Hongtao Liu --- (In reply to Hongtao Liu from comment #5) > For non-avx case, looks like it hits here > > 748 /* Special case TImode to 128-bit vector conversions via V2DI. */ > Prevent that in reload, we get

[Bug target/113729] Missing APX NDD optimization

2024-08-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113729 Hongtao Liu changed: What|Removed |Added Keywords||missed-optimization Resolution|--

[Bug target/116174] [14/15 regression] Alignment request is added before endbr with -fcf-protection=branch since r15-888-gb644126237a1aa

2024-08-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116174 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/115756] default tuning for x86_64 produces shifts for `*240`

2024-08-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115756 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/115749] Non optimal assembly for integer modulo by a constant on x86-64 CPUs

2024-08-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115749 Bug 115749 depends on bug 115756, which changed state. Bug 115756 Summary: default tuning for x86_64 produces shifts for `*240` https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115756 What|Removed |Added --

[Bug target/115749] Non optimal assembly for integer modulo by a constant on x86-64 CPUs

2024-08-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115749 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug target/113600] [14/15 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4

2024-08-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600 --- Comment #10 from Hongtao Liu --- I think it should be fixed by r15-2820-gab18785840d7b8

[Bug target/81602] Unnecessary zero-extension after 16 bit popcnt

2024-08-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81602 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/116274] [14/15 Regression] x86: poor code generation with 16 byte function arguments and addition

2024-08-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116274 --- Comment #8 from Hongtao Liu --- > > codegen is probably an RA/LRA artifact caused by bad instruction constraints > and the refuse to reload to a gpr. Not sure if a move high to gpr is a > thing, > pextrq would work for sure. But an unpck

[Bug target/116174] [14/15 regression] Alignment request is added before endbr with -fcf-protection=branch since r15-888-gb644126237a1aa

2024-08-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116174 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug target/115982] [15 Regression] ICE: unrecognizable insn in ira_remove_insn_scratches with -mavx512vl since r15-1742

2024-08-18 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115982 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Known to fail|

[Bug target/115683] [15 Regression] SSE2 regressions after obselete of vcond{,u,eq}.

2024-08-19 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115683 --- Comment #6 from Hongtao Liu --- (In reply to Uroš Bizjak from comment #5) > (In reply to Hongtao Liu from comment #0) > > > g++: g++.target/i386/pr100637-1b.C > > g++: g++.target/i386/pr100637-1w.C > > g++: g++.target/i386/pr103861-1.C > >

[Bug target/116497] Need no_caller_saved_registers with SSE support

2024-08-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/116512] [12/13/14/15 Regression] vzeroupper emitted even though the upper half of the z registers are returned

2024-08-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116512 Hongtao Liu changed: What|Removed |Added Last reconfirmed|2024-08-28 00:00:00 | Status|NEW

[Bug target/116512] [12/13/14/15 Regression] vzeroupper emitted even though the upper half of the z registers are returned

2024-08-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116512 --- Comment #4 from Hongtao Liu --- gdb shows crtl->return_rtx is 21(parallel/i:BLK [ 22(expr_list:REG_DEP_TRUE (reg:XI 20 xmm0) 23(c

[Bug target/116512] [12/13/14/15 Regression] vzeroupper emitted even though the upper half of the z registers are returned

2024-08-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116512 --- Comment #6 from Hongtao Liu --- (In reply to Andrew Pinski from comment #5) > (In reply to Hongtao Liu from comment #4) > > gdb shows crtl->return_rtx is > > > > 21(parallel/i:BLK [ >

[Bug target/116512] [12/13/14/15 Regression] vzeroupper emitted even though the upper half of the z registers are returned

2024-09-01 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116512 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug target/116582] gather is a win in some cases on zen CPUs

2024-09-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116582 --- Comment #5 from Hongtao Liu --- (In reply to Richard Biener from comment #4) > (In reply to Jan Hubicka from comment #3) > > Just for completeness the codegen for parest sparse matrix multiply is: > > > > 0.31 │320: kmovb %k1,%k

[Bug target/116617] x86_64: arch lunarlake not documented

2024-09-05 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
, ||liuhongt at gcc dot gnu.org --- Comment #2 from Hongtao Liu --- @Haochen Could you add that.

[Bug target/116617] x86_64: arch lunarlake not documented

2024-09-09 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116617 Hongtao Liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug middle-end/116658] New: [GCC15 regression] ICE in vect_is_slp_load_node

2024-09-09 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- Created attachment 59082 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59082&action=edit test111.i g++ -O3 te

[Bug tree-optimization/116674] New: [15 regression] ICE in vectorizable_simd_clone_call bisected to r15-3509-gd34cda72098867

2024-09-11 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org CC: rguenth at gcc dot gnu.org Target Milestone: --- Created

[Bug tree-optimization/116674] [15 regression] ICE in vectorizable_simd_clone_call bisected to r15-3509-gd34cda72098867

2024-09-11 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116674 --- Comment #1 from Hongtao Liu --- Created attachment 59094 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59094&action=edit test.i A more reduced case.

[Bug target/116675] No blend constant permute for V8HImode with just SSE2

2024-09-11 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/114544] [x86] stv should transform (subreg DI (V1TI) 8) as (vec_select:DI (V2DI) (const_int 1))

2024-04-07 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114544 --- Comment #3 from Hongtao Liu --- <__umodti3>: ... 37 58: 66 48 0f 6e c7 movq %rdi,%xmm0 38 5d: 66 48 0f 6e d6 movq %rsi,%xmm2 39 62: 66 0f 6c c2 punpcklqdq %xmm2,%xmm0 40 66:

[Bug target/113288] [i386] Missing #define for -mavx10.1-256 and -mavx10.1-512

2024-04-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113288 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

  1   2   3   4   5   6   >