Re: [PATCH] Consider frequency in cost estimation when converting scalar to vector.

2025-04-23 Thread Hongtao Liu
On Thu, Apr 24, 2025 at 12:50 AM Jan Hubicka wrote: > > > In some benchmark, I notice stv failed due to cost unprofitable, but the > > igain > > is inside the loop, but sse<->integer conversion is outside the loop, > > current cost > > model doesn't consider the frequency of those gain/cost. > >

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-24 Thread Hongtao Liu
On Fri, Apr 25, 2025 at 1:26 PM Jan Hubicka wrote: > > > On Thu, Apr 24, 2025 at 6:27 PM Jan Hubicka wrote: > > > > > > > Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand > > > > or vpandn. > > > > Current register_operand/vector_operand could lose some optimization > > >

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-25 Thread Hongtao Liu
> > I am not so sure about this when it come to relatively common > instructions. Hiding things in unspec prevents combine and other RTL > passes from doing their job. I would say that it only makes sense for > siutations where RTL equivalent is very inconvenient. > In the direction of using gener

Re: [PATCH v2] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-28 Thread Hongtao Liu
On Mon, Apr 28, 2025 at 5:07 PM H.J. Lu wrote: > > On Mon, Apr 28, 2025 at 4:26 PM H.J. Lu wrote: > > > > > > > This is what my patch does: > > > But it iterates through vector_insns, using a def-ref chain to find > > > those insns. I think we can just record those single_set with src as > > > co

Re: [PATCH] i386: Add ix86_expand_unsigned_small_int_cst_argument

2025-04-28 Thread Hongtao Liu
On Sun, Apr 27, 2025 at 10:58 AM H.J. Lu wrote: > > When passing 0xff as an unsigned char function argument with the C frontend > promotion, expand_normal used to get > > constant > 255> > > and returned the rtx value using the sign-extended representation: > > (const_int 255 [0xff]) > > But aft

Re: [PATCH 0/2] i386: Adjust AVX10 related options

2025-02-27 Thread Hongtao Liu
On Mon, Feb 17, 2025 at 9:51 AM Hongtao Liu wrote: > > On Thu, Feb 13, 2025 at 4:08 PM Haochen Jiang wrote: > > > > Hi all, > > > > According to the previous feedback on our RFC for AVX10 option adjustment > > and discussion with LLVM, we finalized how we a

Re: [PATCH] i386: Correct mask width for bf8->fp16 intrin on 256/512 bit

2025-03-05 Thread Hongtao Liu
On Wed, Mar 5, 2025 at 3:23 PM Haochen Jiang wrote: > > Hi all, > > For bf8 -> pf16 convert, when dst is 256 bit, the mask should be > 16 bit since 16*16=256, not the 8 bit in the current intrin. In > 512 bit intrin, the mask bit is also halved. This patch will fix > both of them. > > Ok for trunk

Re: [PATCH] x86: Move TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P to i386.cc

2025-02-27 Thread Hongtao Liu
On Wed, Feb 26, 2025 at 6:01 AM H.J. Lu wrote: > > Move the TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P target hook from > i386.h to i386.cc. Ok for the patch, looks obvious. > > * config/i386/i386.h (TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P): > Moved to ... > * config/i386/i386.cc (TARGET_SMALL_REGI

Re: [RFA] ira: Add new hooks for callee-save vs spills [PR117477]

2025-03-04 Thread Hongtao Liu
On Tue, Mar 4, 2025 at 6:31 PM Richard Biener wrote: > > On Tue, Mar 4, 2025 at 11:18 AM Richard Sandiford > wrote: > > > > Richard Sandiford writes: > > > Jan Hubicka writes: > > >>> > > >>> Thanks for running these. I saw poor results for perlbench with my > > >>> initial aarch64 hooks becau

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-19 Thread Hongtao Liu
On Wed, Feb 19, 2025 at 9:06 PM Jan Hubicka wrote: > > Hi, > this is a variant of a hook I benchmarked on cpu2016 with -Ofast -flto > and -O2 -flto. For non -Os and no Windows ABI should be pratically the > same as your variant that was simply returning mem_cost - 2. > I've tested O2/(Ofast march

Re: [PATCH] i386: Add attr_isa for vaes patterns to sync with attr gpr16. [pr119473]

2025-03-30 Thread Hongtao Liu
On Fri, Mar 28, 2025 at 1:55 PM Hu, Lin1 wrote: > > For vaes patterns with jm constraint and gpr16 attr, it requires "isa" > attr to distinct avx/avx512 alternatives in ix86_memory_address_reg_class. > Also adds missing type and mode attributes for those vaes patterns. Ok. > > gcc/ChangeLog: > >

Re: [PATCH] i386: Add PTA_AVX10_1_256 to PTA_DIAMONDRAPIDS

2025-03-30 Thread Hongtao Liu
On Fri, Mar 28, 2025 at 4:22 PM Haochen Jiang wrote: > > Hi all, > > For -march= handling, PTA_AVX10_1 will not imply PTA_AVX10_1_256, > resulting in TARGET_AVX10_1 becoming true while TARGET_AVX10_1_256 > false. Since we will check TARGET_AVX10_1_256 in GCC 15 for AVX512 > feature enabling for AV

Re: [PATCH] APX: add nf counterparts for rotl split pattern [PR 119539]

2025-04-02 Thread Hongtao Liu
ngtao 于2025年4月2日周三 08:57写道: > > > > > > > > > -Original Message- > > > From: Uros Bizjak > > > Sent: Tuesday, April 1, 2025 5:24 PM > > > To: Hongtao Liu > > > Cc: Wang, Hongyu ; gcc-patches@gcc.gnu.org; Liu, > > > Hongtao > > &g

Re: [PATCH] i386: Set attr "addr" as "gpr16" for constraint "jm". [PR 119425]

2025-03-26 Thread Hongtao Liu
On Wed, Mar 26, 2025 at 9:50 AM Hu, Lin1 wrote: > > Hi, all > > This patch aims to ensure each alternative with constraint "jm" should > set addr "gpr16", otherwise maybe raise ICE in reload pass. > > Bootstrapped and Regtested for x86_64-pc-linux-gnu{-m32,-m64}, ok for trunk? Ok. > > BRs, > Lin >

Re: [PATCH] target/119549 - fixup handling of -mno-sse4

2025-04-04 Thread Hongtao Liu
On Mon, Mar 31, 2025 at 9:52 PM Richard Biener wrote: > > On Mon, 31 Mar 2025, Jakub Jelinek wrote: > > > On Mon, Mar 31, 2025 at 03:33:34PM +0200, Richard Biener wrote: > > > On Mon, 31 Mar 2025, Jakub Jelinek wrote: > > > > > > > On Mon, Mar 31, 2025 at 03:12:56PM +0200, Richard Biener wrote: >

Re: [PATCH v3] Consider frequency in cost estimation when converting scalar to vector.

2025-05-11 Thread Hongtao Liu
On Thu, May 8, 2025 at 2:40 PM liuhongt wrote: > > The only part I changed is related to size_cost of sse_to_ineteger, as below > > 114+ /* Under TARGET_SSE4_1, it's vmovd + vpextrd/vpinsrd. > 115+ W/o it, it's movd + psrlq/unpckldq + movd. */ > 116+ else if (!TARGET_64BIT && smode != SImod

Re: [PATCH v2] x86: Insert extra move for mode size smaller than natural size

2025-05-06 Thread Hongtao Liu
On Wed, May 7, 2025 at 9:06 AM H.J. Lu wrote: > > On Tue, May 6, 2025 at 3:35 PM Hongtao Liu wrote: > > > > On Tue, May 6, 2025 at 3:06 PM H.J. Lu wrote: > > > > > > On Tue, May 6, 2025 at 2:30 PM Liu, Hongtao wrote: > > > > > > > >

Re: [PATCH] Update libbid according to the latest Intel Decimal Floating-Point Math Library.

2025-05-13 Thread Hongtao Liu
On Wed, May 14, 2025 at 9:22 AM liuhongt wrote: > > The Intel Decimal Floating-Point Math Library is available as open-source on > Netlib[1]. > > [1] https://www.netlib.org/misc/intel/ > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ready push to trunk. > > libgcc/config/libbid/Ch

Re: [PATCH] x86: Add preserve_none and update no_caller_saved_registers attributes

2025-05-13 Thread Hongtao Liu
On Fri, Apr 18, 2025 at 7:10 PM H.J. Lu wrote: > > Add preserve_none attribute which is similar to no_callee_saved_registers > attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are Could you split preserve_none into a separate patch, It looks like it's different from clang's p

Re: [PATCH v2 0/7] Remove -mavx10.1-256/512 and -mno-evex512

2025-05-18 Thread Hongtao Liu
On Wed, May 14, 2025 at 3:29 PM Haochen Jiang wrote: > > Hi all, > > This is the v2 patch to remove -mavx10.1/256-512 and -mno-evex512. I suppose > this time all the patches will not be held due to size. > > As mentioned in GCC 15, we will remove -mavx10.1-256/512 and -mno-evex512 > options in GCC

Re: [PATCH] For datarefs with big gap, split them into different groups.

2025-05-15 Thread Hongtao Liu
It's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119181 On Fri, May 16, 2025 at 10:02 AM liuhongt wrote: > > The patch tries to solve miss vectorization for below case. > > void > foo (int* a, int* restrict b) > { > b[0] = a[0] * a[64]; > b[1] = a[65] * a[1]; > b[2] = a[2] * a[66]; >

Re: [PATCH] i386: Add more forms peephole2 for adc/sbb

2025-06-03 Thread Hongtao Liu
On Mon, May 26, 2025 at 4:55 PM Hu, Lin1 wrote: > > Hi, all > > Enable -mapxf will change some patterns about adc/sbb. > > Hence gcc will raise an extra mov like > movq8(%rdi), %rax > adcq%rax, 8(%rsi), %rax > movq%rax, 8(%rdi) > rather than > movq

Re: [PATCH] i386: Add more peephole2 for APX NDD

2025-06-03 Thread Hongtao Liu
On Thu, May 29, 2025 at 4:56 PM Hu, Lin1 wrote: > > Hi, > > The patch aims to optimize > movb(%rdi), %al > movq%rdi, %rbx > xorl%esi, %eax, %edx > movb%dl, (%rdi) > cmpb%sil, %al > jne > to > xorb%sil, (%rdi) >

Re: [PATCH] x86: Extend the remove_redundant_vector pass

2025-06-09 Thread Hongtao Liu
On Tue, Jun 3, 2025 at 2:59 PM H.J. Lu wrote: > > Extend the remove_redundant_vector pass to handle vector broadcasts from > constant and variable scalars. When broadcasting from constants and > function arguments, we can place a single widest vector broadcast at > entry of the nearest common dom

Re: [PATCH] i386: Set SRF, GRR, CWF, GNR, DMR, ARL and PTL issue rate

2025-06-12 Thread Hongtao Liu
On Thu, Jun 12, 2025 at 10:51 AM Hu, Lin1 wrote: > > Hi, > > This patch aims to set SRF issue rate to 4, GNR issue rate to 6. According to > tests about spec2017, the patch has little effect on performance. > > For GRR, CWF, DMR, ARL and PTL, the patch set their issue rate to 6. Waiting > for > m

Re: [PATCH] [AUTOFDO] Don't scale bb_count with ipa_count when ipa_count is zero but count_max is not

2025-06-09 Thread Hongtao Liu
Ping On Mon, May 19, 2025 at 10:06 AM liuhongt wrote: > > From: "hongtao.liu" > > AutoFDO profile is a scaled profile, as a result, 0 sample does not > mean never executed. especially there's profile from function > body. Prevent combine_with_ipa_count·(ipa_count) from zeroing all > bb->count. >

Re: [PATCH v3] x86: Enable *mov_(and|or) only for -Oz

2025-06-17 Thread Hongtao Liu
On Mon, May 26, 2025 at 2:30 PM H.J. Lu wrote: > > On Sun, May 25, 2025 at 7:02 PM H.J. Lu wrote: > > > > On Sun, May 25, 2025 at 8:12 AM H.J. Lu wrote: > > > > > > On Sun, May 25, 2025 at 7:47 AM H.J. Lu wrote: > > > > > > > > commit ef26c151c14a87177d46fd3d725e7f82e040e89f > > > > Author: Rog

Re: [PATCH v2 x86: Extend the remove_redundant_vector pass

2025-06-17 Thread Hongtao Liu
On Wed, Jun 18, 2025 at 2:39 PM H.J. Lu wrote: > > On Mon, Jun 16, 2025 at 4:14 PM Hongtao Liu wrote: > > > > >+enum redundant_load_kind > > >+{ > > >+ LOAD_CONST0_VECTOR, > > >+ LOAD_CONSTM1_VECTOR, > > >+ LOAD_VECTOR > >

Re: [PATCH v2 x86: Extend the remove_redundant_vector pass

2025-06-16 Thread Hongtao Liu
>+enum redundant_load_kind >+{ >+ LOAD_CONST0_VECTOR, >+ LOAD_CONSTM1_VECTOR, >+ LOAD_VECTOR >+}; Perhaps rename to x86_cse_kind, X86_CSE_CONST0_VECTOR, X86_CSE_CONSTM1_VECTOR, X86_CSE_VEC_DUP? LOAD sounds a bit ambiguous. Similar to ix86_get_vector_load_mode -> ix86_get_vector_cse_mode? >+

Re: [PATCH v2 x86: Extend the remove_redundant_vector pass

2025-06-16 Thread Hongtao Liu
On Mon, Jun 16, 2025 at 4:30 PM Hongtao Liu wrote: > > >+enum redundant_load_kind > >+{ > >+ LOAD_CONST0_VECTOR, > >+ LOAD_CONSTM1_VECTOR, > >+ LOAD_VECTOR > >+}; > Perhaps rename to x86_cse_kind, X86_CSE_CONST0_VECTOR, > X86_CSE_CONSTM1_VECTOR, X

Re: [PATCH] [AUTOFDO] Don't scale bb_count with ipa_count when ipa_count is zero but count_max is not

2025-06-16 Thread Hongtao Liu
Drop this patch since https://gcc.gnu.org/pipermail/gcc-patches/2025-June/686830.html could be a better alternative. On Tue, Jun 10, 2025 at 9:50 AM Hongtao Liu wrote: > > Ping > > On Mon, May 19, 2025 at 10:06 AM liuhongt wrote: > > > > From: "hongtao.liu" &

Re: [PATCH v4] x86: Enable *mov_(and|or) only for -Oz

2025-06-19 Thread Hongtao Liu
On Wed, Jun 18, 2025 at 6:38 PM H.J. Lu wrote: > > commit ef26c151c14a87177d46fd3d725e7f82e040e89f > Author: Roger Sayle > Date: Thu Dec 23 12:33:07 2021 + > > x86: PR target/103773: Fix wrong-code with -Oz from pop to memory. > > added "*mov_and" and extended "*mov_or" to transform > "

Re: [PATCH V2] [vect]Enhance NARROW FLOAT_EXPR vectorization by truncating integer to lower precision.

2023-05-28 Thread Hongtao Liu via Gcc-patches
ping. On Mon, May 8, 2023 at 9:59 AM liuhongt wrote: > > > > @@ -4799,7 +4800,8 @@ vect_create_vectorized_demotion_stmts (vec_info > > > *vinfo, vec *vec_oprnds, > > >stmt_vec_info stmt_info, > > >vec &vec_dsts, > >

Re: [PATCH] Fold _mm{, 256, 512}_abs_{epi8, epi16, epi32, epi64} into gimple ABSU_EXPR + VCE.

2023-06-06 Thread Hongtao Liu via Gcc-patches
On Tue, Jun 6, 2023 at 12:49 PM Andrew Pinski wrote: > > On Mon, Jun 5, 2023 at 9:34 PM liuhongt via Gcc-patches > wrote: > > > > r14-1145 fold the intrinsics into gimple ABS_EXPR which has UB for > > TYPE_MIN, but PABSB will store unsigned result into dst. The patch > > uses ABSU_EXPR + VCE inst

Re: [PATCH] Fold _mm{, 256, 512}_abs_{epi8, epi16, epi32, epi64} into gimple ABSU_EXPR + VCE.

2023-06-06 Thread Hongtao Liu via Gcc-patches
On Tue, Jun 6, 2023 at 5:11 PM Uros Bizjak wrote: > > On Tue, Jun 6, 2023 at 6:33 AM liuhongt via Gcc-patches > wrote: > > > > r14-1145 fold the intrinsics into gimple ABS_EXPR which has UB for > > TYPE_MIN, but PABSB will store unsigned result into dst. The patch > > uses ABSU_EXPR + VCE instead

Re: [PATCH] Fold _mm{, 256, 512}_abs_{epi8, epi16, epi32, epi64} into gimple ABSU_EXPR + VCE.

2023-06-06 Thread Hongtao Liu via Gcc-patches
On Tue, Jun 6, 2023 at 10:36 PM Uros Bizjak wrote: > > On Tue, Jun 6, 2023 at 1:42 PM Hongtao Liu wrote: > > > > On Tue, Jun 6, 2023 at 5:11 PM Uros Bizjak wrote: > > > > > > On Tue, Jun 6, 2023 at 6:33 AM liuhongt via Gcc-patches > > > wrote: >

Re: [PATCH] Fold _mm{, 256, 512}_abs_{epi8, epi16, epi32, epi64} into gimple ABSU_EXPR + VCE.

2023-06-08 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 7, 2023 at 8:31 AM Hongtao Liu wrote: > > On Tue, Jun 6, 2023 at 10:36 PM Uros Bizjak wrote: > > > > On Tue, Jun 6, 2023 at 1:42 PM Hongtao Liu wrote: > > > > > > On Tue, Jun 6, 2023 at 5:11 PM Uros Bizjak wrote: > > > > > &g

Re: [PATCH v2] Explicitly view_convert_expr mask to signed type when folding pblendvb builtins.

2023-06-08 Thread Hongtao Liu via Gcc-patches
On Tue, Jun 6, 2023 at 4:23 PM liuhongt wrote: > > > I think this is a better patch and will always be correct and still > > get folded at the gimple level (correctly): > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > index d4ff56ee8dd..02bf5ba93a5 100644 > > --- a/gcc/config

Re: [PATCH] [x86] Add missing vec_pack/unpacks patterns for _Float16 <-> int/float conversion.

2023-06-12 Thread Hongtao Liu via Gcc-patches
On Mon, Jun 5, 2023 at 9:26 AM liuhongt wrote: > > This patch only support vec_pack/unpacks optabs for vector modes whose lenth > >= 128. > For 32/64-bit vector, they're more hanlded by BB vectorizer with > truncmn2/extendmn2/fix{,uns}_truncmn2. > > Bootstrapped and regtested on x86_64-pc-linux-g

Re: [PATCH] x86/AVX512: use VMOVDDUP for broadcast to V2DF

2023-06-13 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 14, 2023 at 1:55 PM Jan Beulich via Gcc-patches wrote: > > Like is already the case for the AVX/AVX2 form, VMOVDDUP - acting on > double precision floating values - is more appropriate to use here, and > it can also result in shorter insn encodings when source is memory or > %xmm0...%x

Re: [PATCH] x86: add Bk and Br to comment list B's sub-chars

2023-06-13 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 14, 2023 at 1:56 PM Jan Beulich via Gcc-patches wrote: > > gcc/ > > * config/i386/constraints.md: Mention k and r for B. Ok. > > --- a/gcc/config/i386/constraints.md > +++ b/gcc/config/i386/constraints.md > @@ -162,7 +162,9 @@ > ;; g GOT memory operand. > ;; m Vector memo

Re: [PATCH] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-14 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches wrote: > > ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are > never longer (yet sometimes shorter) than the corresponding VSHUFPS / > VPSHUFD, due to the immediate operand of the shuffle insns balancing the > need for

Re: [PATCH] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F

2023-06-14 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 14, 2023 at 1:59 PM Jan Beulich via Gcc-patches wrote: > > There's no reason to constrain this to AVX512VL, as the wider operation > is not usable for more narrow operands only when the possible memory But this may require more resources (on AMD znver4 processor a zmm instruction will

Re: [PATCH 8/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_PERMUTE

2023-06-14 Thread Hongtao Liu via Gcc-patches
On Tue, Jun 13, 2023 at 10:07 AM Kewen Lin via Gcc-patches wrote: > > This patch adjusts the cost handling on > VMAT_CONTIGUOUS_PERMUTE in function vectorizable_load. We > don't call function vect_model_load_cost for it any more. > > As the affected test case gcc.target/i386/pr70021.c shows, > th

Re: [PATCH] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-14 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 14, 2023 at 5:03 PM Jan Beulich wrote: > > On 14.06.2023 09:41, Hongtao Liu wrote: > > On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches > > wrote: > >> > >> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are > >&

Re: [PATCH] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F

2023-06-14 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 14, 2023 at 5:32 PM Jan Beulich wrote: > > On 14.06.2023 10:10, Hongtao Liu wrote: > > On Wed, Jun 14, 2023 at 1:59 PM Jan Beulich via Gcc-patches > > wrote: > >> > >> There's no reason to constrain this to AVX512VL, as the wider operation &g

Re: [PATCH] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-14 Thread Hongtao Liu via Gcc-patches
On Thu, Jun 15, 2023 at 1:23 PM Hongtao Liu wrote: > > On Wed, Jun 14, 2023 at 5:03 PM Jan Beulich wrote: > > > > On 14.06.2023 09:41, Hongtao Liu wrote: > > > On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches > > > wrote: > > >&g

Re: [PATCH] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-15 Thread Hongtao Liu via Gcc-patches
On Thu, Jun 15, 2023 at 2:41 PM Jan Beulich wrote: > > On 15.06.2023 07:23, Hongtao Liu wrote: > > On Wed, Jun 14, 2023 at 5:03 PM Jan Beulich wrote: > >> > >> On 14.06.2023 09:41, Hongtao Liu wrote: > >>> On Wed, Jun 14, 2023 at 1:58 P

Re: [PATCH] x86: correct and improve "*vec_dupv2di"

2023-06-15 Thread Hongtao Liu via Gcc-patches
On Thu, Jun 15, 2023 at 3:07 PM Uros Bizjak via Gcc-patches wrote: > > On Thu, Jun 15, 2023 at 8:03 AM Jan Beulich via Gcc-patches > wrote: > > > > The input constraint for the %vmovddup alternative was wrong, as the > > upper 16 XMM registers require AVX512VL to be used with this insn. To > > co

Re: [x86 PATCH] Tweak ix86_expand_int_compare to use PTEST for vector equality.

2023-07-11 Thread Hongtao Liu via Gcc-patches
On Wed, Jul 12, 2023 at 4:57 AM Roger Sayle wrote: > > > > From: Hongtao Liu > > Sent: 28 June 2023 04:23 > > > From: Roger Sayle > > > Sent: 27 June 2023 20:28 > > > > > > I've also come up with an alternate/complementary/supplemen

Re: [PATCH V2] Provide -fcf-protection=branch,return.

2023-07-12 Thread Hongtao Liu via Gcc-patches
ping. On Mon, May 22, 2023 at 4:08 PM Hongtao Liu wrote: > > ping. > > On Sat, May 13, 2023 at 5:20 PM liuhongt wrote: > > > > > I think this could be simplified if you use either EnumSet or > > > EnumBitSet instead in common.opt for `-fcf-protection=`. >

Re: [PATCH] tree-optimization/94864 - vector insert of vector extract simplification

2023-07-12 Thread Hongtao Liu via Gcc-patches
On Wed, Jul 12, 2023 at 9:37 PM Richard Biener via Gcc-patches wrote: > > The PRs ask for optimizing of > > _1 = BIT_FIELD_REF ; > result_4 = BIT_INSERT_EXPR ; > > to a vector permutation. The following implements this as > match.pd pattern, improving code generation on x86_64. > > On the RTL

Re: [PATCH] tree-optimization/94864 - vector insert of vector extract simplification

2023-07-12 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 13, 2023 at 10:47 AM Hongtao Liu wrote: > > On Wed, Jul 12, 2023 at 9:37 PM Richard Biener via Gcc-patches > wrote: > > > > The PRs ask for optimizing of > > > > _1 = BIT_FIELD_REF ; > > result_4 = BIT_INSERT_EXPR ; > > > > to a

Re: [PATCH] tree-optimization/94864 - vector insert of vector extract simplification

2023-07-13 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 13, 2023 at 2:32 PM Richard Biener wrote: > > On Thu, 13 Jul 2023, Hongtao Liu wrote: > > > On Thu, Jul 13, 2023 at 10:47?AM Hongtao Liu wrote: > > > > > > On Wed, Jul 12, 2023 at 9:37?PM Richard Biener via Gcc-patches > > > wrote: >

Re: [PATCH 1/4] Support Intel AVX-VNNI-INT16

2023-07-16 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 13, 2023 at 2:06 PM Haochen Jiang via Gcc-patches wrote: > > From: Kong Lingling > > gcc/ChangeLog > > * common/config/i386/cpuinfo.h (get_available_features): Detect > avxvnniint16. > * common/config/i386/i386-common.cc > (OPTION_MASK_ISA2_AVXVNNIINT16

Re: [PATCH 3/4] Support Intel SHA512

2023-07-16 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 13, 2023 at 2:06 PM Haochen Jiang via Gcc-patches wrote: > > gcc/ChangeLog: > > * common/config/i386/cpuinfo.h (get_available_features): > Detect SHA512. > * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_SHA512_SET, > OPTION_MASK_ISA2_SHA512_UNSET)

Re: [PATCH 2/4] Support Intel SM3

2023-07-16 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 13, 2023 at 2:04 PM Haochen Jiang via Gcc-patches wrote: > > gcc/ChangeLog: > > * common/config/i386/cpuinfo.h (get_available_features): > Detect SM3. > * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_SM3_SET, > OPTION_MASK_ISA2_SM3_UNSET): New. >

Re: [PATCH 4/4] Support Intel SM4

2023-07-16 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 13, 2023 at 2:04 PM Haochen Jiang via Gcc-patches wrote: > > gcc/ChangeLog: > > * common/config/i386/cpuinfo.h (get_available_features): > Detech SM4. > * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_SM4_SET, > OPTION_MASK_ISA2_SM4_UNSET): New. >

Re: [PATCH] Initial Lunar Lake, Arrow Lake and Arrow Lake S Support

2023-07-16 Thread Hongtao Liu via Gcc-patches
On Fri, Jul 14, 2023 at 10:55 AM Mo, Zewei via Gcc-patches wrote: > > Hi all, > > This patch is to add initial support for Lunar Lake, Arrow Lake and Arrow Lake > S for GCC. > > This link of related information is listed below: > https://www.intel.com/content/www/us/en/develop/download/intel-archi

Re: [PATCH] x86: slightly enhance "vec_dupv2df"

2023-07-16 Thread Hongtao Liu via Gcc-patches
On Fri, Jul 14, 2023 at 5:40 PM Jan Beulich via Gcc-patches wrote: > > Introduce a new alternative permitting all 32 registers to be used as > source without AVX512VL, by broadcasting to the full 512 bits in that > case. (The insn would also permit all registers to be used as > destination, but V2

Re: [PATCH] x86: avoid maybe_gen_...()

2023-07-16 Thread Hongtao Liu via Gcc-patches
On Fri, Jul 14, 2023 at 5:42 PM Jan Beulich via Gcc-patches wrote: > > In the (however unlikely) event that no insn can be found for the > requested mode, using maybe_gen_...() without (really) checking its > result for being a null rtx would lead to silent bad code generation. Ok. > > gcc/ > >

Re: [PATCH] x86: slightly enhance "vec_dupv2df"

2023-07-16 Thread Hongtao Liu via Gcc-patches
On Mon, Jul 17, 2023 at 2:20 PM Jan Beulich wrote: > > On 17.07.2023 08:09, Hongtao Liu wrote: > > On Fri, Jul 14, 2023 at 5:40 PM Jan Beulich via Gcc-patches > > wrote: > >> > >> Introduce a new alternative permitting all 32 registers to be used as > >

Re: [PATCH] Add peephole to eliminate redundant comparison after cmpccxadd.

2023-07-16 Thread Hongtao Liu via Gcc-patches
Ping. On Tue, Jul 11, 2023 at 5:16 PM liuhongt via Gcc-patches wrote: > > Similar like we did for CMPXCHG, but extended to all > ix86_comparison_int_operator since CMPCCXADD set EFLAGS exactly same > as CMP. > > When operand order in CMP insn is same as that in CMPCCXADD, > CMP insn can be elimin

Re: [PATCH 1/2] [i386] Support type _Float16/__bf16 independent of SSE2.

2023-07-17 Thread Hongtao Liu via Gcc-patches
I'd like to ping for this patch (only patch 1/2, for patch 2/2, I think that may not be necessary). On Mon, May 15, 2023 at 9:20 AM Hongtao Liu wrote: > > ping. > > On Fri, Apr 21, 2023 at 9:55 PM liuhongt wrote: > > > > > > + if (!TARGET_SSE2) > > &g

Re: [PATCH 1/2] [i386] Support type _Float16/__bf16 independent of SSE2.

2023-07-18 Thread Hongtao Liu via Gcc-patches
On Mon, Jul 17, 2023 at 7:38 PM Uros Bizjak wrote: > > On Mon, Jul 17, 2023 at 10:28 AM Hongtao Liu wrote: > > > > I'd like to ping for this patch (only patch 1/2, for patch 2/2, I > > think that may not be necessary). > > > > On Mon, May 15, 2023 at 9:2

Re: [PATCH V2] Provide -fcf-protection=branch,return.

2023-07-19 Thread Hongtao Liu via Gcc-patches
On Wed, Jul 12, 2023 at 3:27 PM Hongtao Liu wrote: > > ping. > > On Mon, May 22, 2023 at 4:08 PM Hongtao Liu wrote: > > > > ping. > > > > On Sat, May 13, 2023 at 5:20 PM liuhongt wrote: > > > > > > > I think this could be simplified i

Re: [PATCH] Optimize vlddqu to vmovdqu for TARGET_AVX

2023-07-20 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 20, 2023 at 4:11 PM Uros Bizjak via Gcc-patches wrote: > > On Thu, Jul 20, 2023 at 9:35 AM liuhongt wrote: > > > > For Intel processors, after TARGET_AVX, vmovdqu is optimized as fast > > as vlddqu, UNSPEC_LDDQU can be removed to enable more optimizations. > > Can someone confirm this

Re: [r14-2834 Regression] FAIL: gcc.target/i386/pr87007-5.c scan-assembler-times vxorps[^\n\r]*xmm[0-9] 1 on Linux/x86_64

2023-07-31 Thread Hongtao Liu via Gcc-patches
On Sat, Jul 29, 2023 at 11:55 AM haochen.jiang via Gcc-regression wrote: > > On Linux/x86_64, > > b9d7140c80bd3c7355b8291bb46f0895dcd8c3cb is the first bad commit > commit b9d7140c80bd3c7355b8291bb46f0895dcd8c3cb > Author: Jan Hubicka > Date: Fri Jul 28 09:16:09 2023 +0200 > > loop-split im

Re: [x86 PATCH] UNSPEC_PALIGNR optimizations and clean-ups.

2022-06-30 Thread Hongtao Liu via Gcc-patches
On Fri, Jul 1, 2022 at 2:42 AM Roger Sayle wrote: > > > This patch is a follow-up to Hongtao's fix for PR target/105854. That > fix is perfectly correct, but the thing that caught my eye was why is > the compiler generating a shift by zero at all. Digging deeper it > turns out that we can easily

Re: [PATCH] Add myself for write after approval

2022-06-30 Thread Hongtao Liu via Gcc-patches
I think this can be taken as an obvious fix without prior approval. "Obvious fixes can be committed without prior approval. Just check in the fix and copy it to gcc-patches." Quoted from https://gcc.gnu.org/gitwrite.html On Fri, Jul 1, 2022 at 10:02 AM Haochen Jiang via Gcc-patches wrote: > > Hi

Re: [x86 PATCH] UNSPEC_PALIGNR optimizations and clean-ups.

2022-06-30 Thread Hongtao Liu via Gcc-patches
On Fri, Jul 1, 2022 at 10:12 AM Hongtao Liu wrote: > > On Fri, Jul 1, 2022 at 2:42 AM Roger Sayle wrote: > > > > > > This patch is a follow-up to Hongtao's fix for PR target/105854. That > > fix is perfectly correct, but the thing that caught my eye was w

Re: [x86 PATCH] UNSPEC_PALIGNR optimizations and clean-ups.

2022-07-04 Thread Hongtao Liu via Gcc-patches
; This revised patch has been tested on x86_64-pc-linux-gnu with make > bootstrap and make -k check, both with and with --target_board=unix{-32}, > with no new failures. Is this revised version Ok for mainline? Ok. > > > 2022-07-04 Roger Sayle > Hongtao Liu >

Re: [PATCH] [RFC]Support vectorization for Complex type.

2022-07-11 Thread Hongtao Liu via Gcc-patches
On Mon, Jul 11, 2022 at 7:47 PM Richard Biener via Gcc-patches wrote: > > On Mon, Jul 11, 2022 at 5:44 AM liuhongt wrote: > > > > The patch only handles load/store(including ctor/permutation, except > > gather/scatter) for complex type, other operations don't needs to be > > handled since they wi

Re: [PATCH] Allocate general register(memory/immediate) for 16/32/64-bit vector bit_op patterns.

2022-07-11 Thread Hongtao Liu via Gcc-patches
On Mon, Jul 11, 2022 at 4:03 PM Uros Bizjak via Gcc-patches wrote: > > On Mon, Jul 11, 2022 at 3:15 AM liuhongt wrote: > > > > And split it to GPR-version instruction after reload. > > > > This will enable below optimization for 16/32/64-bit vector bit_op > > > > - movd(%rdi), %xmm0 > >

Re: [PATCH] [RFC]Support vectorization for Complex type.

2022-07-12 Thread Hongtao Liu via Gcc-patches
On Tue, Jul 12, 2022 at 10:12 PM Richard Biener wrote: > > On Tue, Jul 12, 2022 at 6:11 AM Hongtao Liu wrote: > > > > On Mon, Jul 11, 2022 at 7:47 PM Richard Biener via Gcc-patches > > wrote: > > > > > > On Mon, Jul 11, 2022 at 5:44 AM liuhongt wro

Re: [PATCH] [RFC]Support vectorization for Complex type.

2022-07-14 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 14, 2022 at 4:20 PM Richard Biener wrote: > > On Wed, Jul 13, 2022 at 9:34 AM Richard Biener > wrote: > > > > On Wed, Jul 13, 2022 at 6:47 AM Hongtao Liu wrote: > > > > > > On Tue, Jul 12, 2022 at 10:12 PM Richard Biener > > > wrote:

Re: [PATCH] [RFC]Support vectorization for Complex type.

2022-07-14 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 14, 2022 at 4:53 PM Hongtao Liu wrote: > > On Thu, Jul 14, 2022 at 4:20 PM Richard Biener > wrote: > > > > On Wed, Jul 13, 2022 at 9:34 AM Richard Biener > > wrote: > > > > > > On Wed, Jul 13, 2022 at 6:47 AM Hongtao Liu wrote: >

Re: [PATCH] Extend 64-bit vector bit_op patterns with ?r alternative

2022-07-14 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 14, 2022 at 3:22 PM Uros Bizjak via Gcc-patches wrote: > > On Thu, Jul 14, 2022 at 7:33 AM liuhongt wrote: > > > > And split it to GPR-version instruction after reload. > > > > > ?r was introduced under the assumption that we want vector values > > > mostly in vector registers. Curren

Re: [PATCH] i386: Fix _mm_[u]comixx_{ss,sd} codegen and add PF result. [PR106113]

2022-07-14 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 14, 2022 at 2:11 PM Kong, Lingling via Gcc-patches wrote: > > Hi, > > The patch is to fix _mm_[u]comixx_{ss,sd} codegen and add PF result. These > intrinsics have changed over time, like `_mm_comieq_ss ` old operation is > `RETURN ( a[31:0] == b[31:0] ) ? 1 : 0`, and new operation u

Re: [PATCH] x86: Disable sibcall if indirect_return attribute doesn't match

2022-07-14 Thread Hongtao Liu via Gcc-patches
On Fri, Jul 15, 2022 at 1:44 AM H.J. Lu via Gcc-patches wrote: > > When shadow stack is enabled, function with indirect_return attribute > may return via indirect jump. In this case, we need to disable sibcall > if caller doesn't have indirect_return attribute and indirect branch > tracking is en

Re: [AVX512 PATCH] Add UNSPEC_MASKOP to kupck instructions in sse.md.

2022-07-17 Thread Hongtao Liu via Gcc-patches
On Sat, Jul 16, 2022 at 10:08 PM Roger Sayle wrote: > > > This AVX512 specific patch to sse.md is split out from an earlier patch: > https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596199.html > > The new splitters proposed in that patch interfere with AVX512's > kunpckdq instruction which is

Re: [PATCH V2] Extend 16/32-bit vector bit_op patterns with (m, 0, i) alternative.

2022-07-18 Thread Hongtao Liu via Gcc-patches
On Tue, Jul 19, 2022 at 2:35 PM Uros Bizjak via Gcc-patches wrote: > > On Tue, Jul 19, 2022 at 8:07 AM liuhongt wrote: > > > > And split it after reload. > > > > > You will need ix86_binary_operator_ok insn constraint here with > > > corresponding expander using ix86_fixup_binary_operands_no_copy

Re: [PATCH V2] Extend 16/32-bit vector bit_op patterns with (m, 0, i) alternative.

2022-07-19 Thread Hongtao Liu via Gcc-patches
On Tue, Jul 19, 2022 at 5:37 PM Uros Bizjak wrote: > > On Tue, Jul 19, 2022 at 8:56 AM Hongtao Liu wrote: > > > > On Tue, Jul 19, 2022 at 2:35 PM Uros Bizjak via Gcc-patches > > wrote: > > > > > > On Tue, Jul 19, 2022 at 8:07 AM liuhongt wrote:

Re: [PATCH V2] Extend 16/32-bit vector bit_op patterns with (m, 0, i) alternative.

2022-07-19 Thread Hongtao Liu via Gcc-patches
On Wed, Jul 20, 2022 at 2:18 PM Uros Bizjak wrote: > > On Wed, Jul 20, 2022 at 8:14 AM Uros Bizjak wrote: > > > > On Wed, Jul 20, 2022 at 4:37 AM Hongtao Liu wrote: > > > > > > On Tue, Jul 19, 2022 at 5:37 PM Uros Bizjak wrote: > > > > > >

Re: [PATCH V2] Extend 16/32-bit vector bit_op patterns with (m, 0, i) alternative.

2022-07-20 Thread Hongtao Liu via Gcc-patches
On Wed, Jul 20, 2022 at 3:18 PM Uros Bizjak wrote: > > On Wed, Jul 20, 2022 at 8:54 AM Hongtao Liu wrote: > > > > On Wed, Jul 20, 2022 at 2:18 PM Uros Bizjak wrote: > > > > > > On Wed, Jul 20, 2022 at 8:14 AM Uros Bizjak wrote: > > > > > >

gcc-patches@gcc.gnu.org

2022-07-20 Thread Hongtao Liu via Gcc-patches
a.c. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ok for trunk? > > OK. > > Are there cases left your vectorizer patch handles over this one? No. > > Thanks, > Richard. > > > 2022-07-20 Richard Biener > > H

Re: [PATCH] Move pass_cse_sincos after vectorizer.

2022-07-20 Thread Hongtao Liu via Gcc-patches
On Wed, Jul 20, 2022 at 3:59 PM Richard Biener via Gcc-patches wrote: > > On Wed, Jul 20, 2022 at 4:20 AM liuhongt wrote: > > > > __builtin_cexpi can't be vectorized since there's gap between it and > > vectorized sincos version(In libmvec, it passes a double and two > > double pointer and return

Re: [PATCH] x86: Enable __bf16 type for TARGET_SSE2 and above

2022-08-03 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 3, 2022 at 4:41 PM Kong, Lingling via Gcc-patches wrote: > > Hi, > > Old patch has some mistake in `*movbf_internal` , now disable BFmode constant > double move in `*movbf_internal`. LGTM. > > Thanks, > Lingling > > > -Original Message- > > From: Kong, Lingling > > Sent: Tues

Re: [RFC: PATCH] Extend vectorizer to handle nonlinear induction for neg, mul/lshift/rshift with a constant.

2022-08-04 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 4, 2022 at 4:19 PM Richard Biener via Gcc-patches wrote: > > On Thu, Aug 4, 2022 at 6:29 AM liuhongt via Gcc-patches > wrote: > > > > For neg, the patch create a vec_init as [ a, -a, a, -a, ... ] and no > > vec_step is needed to update vectorized iv since vf is always multiple > > of

Re: [PATCH] Replace invariant ternlog operands

2023-08-03 Thread Hongtao Liu via Gcc-patches
On Fri, Aug 4, 2023 at 1:30 AM Alexander Monakov wrote: > > > On Thu, 27 Jul 2023, Liu, Hongtao via Gcc-patches wrote: > > > > +;; If the first and the second operands of ternlog are invariant and ;; > > > +the third operand is memory ;; then we should add load third operand > > > +from memory to

Re: [PATCH 01/10] x86: "prefix_extra" tidying

2023-08-03 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 3, 2023 at 4:10 PM Jan Beulich via Gcc-patches wrote: > > Drop SSE5 leftovers from both its comment and its default calculation. > A value of 2 simply cannot occur anymore. Instead extend the comment to > mention the use of the attribute in "length_vex", clarifying why > "prefix_extra"

Re: [PATCH 02/10] x86: "sse4arg" adjustments

2023-08-03 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 3, 2023 at 4:10 PM Jan Beulich via Gcc-patches wrote: > > Record common properties in other attributes' default calculations: > There's always a 1-byte immediate, and they're always encoded in a VEX3- > like manner (note that "prefix_extra" already evaluates to 1 in this > case). The d

Re: [PATCH 03/10] x86: "ssemuladd" adjustments

2023-08-03 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 3, 2023 at 4:11 PM Jan Beulich via Gcc-patches wrote: > > They're all VEX3- (also covering XOP) or EVEX-encoded. Express that in > the default calculation of "prefix". FMA4 insns also all have a 1-byte > immediate operand. > > Where the default calculation is not sufficient / applicabl

Re: [PATCH 05/10] x86: replace/correct bogus "prefix_extra"

2023-08-03 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 3, 2023 at 4:14 PM Jan Beulich via Gcc-patches wrote: > > In the rdrand and rdseed cases "prefix_0f" is meant instead. For > mmx_floatv2siv2sf2 1 is correct only for the first alternative. For > the integer min/max cases 1 uniformly applies to legacy and VEX > encodings (the UB and SW

Re: [PATCH 06/10] x86: drop stray "prefix_extra"

2023-08-03 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 3, 2023 at 4:16 PM Jan Beulich via Gcc-patches wrote: > > While the attribute is relevant for legacy- and VEX-encoded insns, it is > of no relevance for EVEX-encoded ones. > > While there in avx512dq_broadcast_1 add > the missing "length_immediate". Ok. > > gcc/ > > * config/i3

Re: [PATCH 04/10] x86: "prefix_extra" can't really be "2"

2023-08-03 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 3, 2023 at 4:11 PM Jan Beulich via Gcc-patches wrote: > > In the three remaining instances separate "prefix_0f" and "prefix_rep" > are what is wanted instead. Ok. > > gcc/ > > * config/i386/i386.md (rdbase): Add "prefix_0f" and > "prefix_rep". Drop "prefix_extra". >

Re: [PATCH 09/10] x86: correct "length_immediate" in a few cases

2023-08-03 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 3, 2023 at 4:14 PM Jan Beulich via Gcc-patches wrote: > > When first added explicitly in 3ddffba914b2 ("i386.md > (sse4_1_round2): Add avx512f alternative"), "*" should not have > been used for the pre-existing alternative. The attribute was plain > missing. Subsequent changes adding m

Re: [PATCH 07/10] x86: add (adjust) XOP insn attributes

2023-08-03 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 3, 2023 at 4:14 PM Jan Beulich via Gcc-patches wrote: > > Many were lacking "prefix" and "prefix_extra", some had a bogus value of > 2 for "prefix_extra" (presumably inherited from their SSE5 counterparts, > which are long gone) and a meaningless "prefix_data16" one. Where > missing, "

Re: [PATCH 10/10] x86: drop redundant "prefix_data16" attributes

2023-08-03 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 3, 2023 at 4:17 PM Jan Beulich via Gcc-patches wrote: > > The attribute defaults to 1 for TI-mode insns of type sselog, sselog1, > sseiadd, sseimul, and sseishft. > > In *v8hi3 [smaxmin] and *v16qi3 [umaxmin] also drop the > similarly stray "prefix_extra" at this occasion. These two ma

Re: [PATCH 08/10] x86: add missing "prefix" attribute to VF{,C}MULC

2023-08-03 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 3, 2023 at 4:16 PM Jan Beulich via Gcc-patches wrote: > > gcc/ > > * config/i386/sse.md > (__): Add > "prefix" attribute. > > (avx512fp16_sh_v8hf): > Likewise. Ok. > --- > Talking of "prefix": Shouldn't at least V32HF and V32BF have it also > de

<    1   2   3   4   5   6   7   8   9   10   >