Re: [V2 PATCH] Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode.

2024-06-05 Thread Hongtao Liu
On Wed, Jun 5, 2024 at 10:44 PM Jeff Law wrote: > > > > On 6/4/24 10:22 PM, liuhongt wrote: > >> Can you add a testcase for this? I don't mind if it's x86 specific and > >> does a bit of asm scanning. > >> > >> Also note that the context for this patch has changed, so it won't > >> automatically

Re: [PATCH] [APX] Adjust target-support check [PR 115341]

2024-06-05 Thread Hongtao Liu
On Thu, Jun 6, 2024 at 2:39 PM Hongyu Wang wrote: > > Current target apxf check does not specify sub-features that assembler > supports, so the check with older binutils will fail at assemble stage > for new apx features like NF,CCMP or CFCMOV. Adjust the assembler check > for latest apx subfeatur

Re: [x86 SSE] Improve handling of ternlog instructions in i386/sse.md (v3)

2024-06-06 Thread Hongtao Liu
vpternlogd[ \\t] 694 > > > 2024-06-06 Roger Sayle > Hongtao Liu > > gcc/ChangeLog > * config/i386/i386-expand.cc (ix86_expand_args_builtin): Call > fixup_modeless_constant before testing predicates. Only call > copy_to_mode_reg on memory

Re: [x86 PATCH] PR target/115397: AVX512 ternlog vs. -m32 -fPIC constant pool.

2024-06-10 Thread Hongtao Liu
On Mon, Jun 10, 2024 at 3:20 PM Roger Sayle wrote: > > > This patch fixes PR target/115397, a recent regression caused by my > ternlog patch that results in an ICE (building numpy) with -m32 -fPIC. > The problem is that ix86_broadcast_from_constant, which calls > get_pool_constant, doesn't handle

Re: [PATCH] AVX-512: Pacify -Wshift-overflow=2. [PR115409]

2024-06-10 Thread Hongtao Liu
On Mon, Jun 10, 2024 at 2:37 PM Collin Funk wrote: > > A shift of 31 on a signed int is undefined behavior. Since unsigned > int is 32-bits wide this change fixes it and silences the warning. Ok. > > gcc/ChangeLog: > > PR target/115409 > * config/i386/avx512fp16intrin.h (_mm512_co

Re: [x86 PATCH] More use of m{32, 64}bcst addressing modes with ternlog.

2024-06-12 Thread Hongtao Liu
On Thu, Jun 13, 2024 at 4:20 AM Roger Sayle wrote: > > > This patch makes more use of m32bcst and m64bcst addressing modes in > ix86_expand_ternlog. Previously, the i386 backend would only consider > using a m32bcst if the inner mode of the vector was 32-bits, or using > m64bcst if the inner mode

Re: [PATCH] [APX ZU] Support APX zero-upper

2024-06-12 Thread Hongtao Liu
On Thu, Jun 6, 2024 at 4:49 PM Kong, Lingling wrote: > > Enable ZU for IMUL (opcodes 0x69 and 0x6B) and SETcc. > > gcc/ChangeLog: > > * config/i386/i386-opts.h (enum apx_features):Add apx_zu. > * config/i386/i386.h (TARGET_APX_ZU): Define. > * config/i386/i386.md (*imulhizu

Re: [PATCH] i386: Handle target of __builtin_ia32_cmp[p|s][s|d] from avx into sse/sse2/avx

2024-06-12 Thread Hongtao Liu
On Thu, May 30, 2024 at 1:52 PM Hu, Lin1 wrote: > > Hi, all > > This patch aims to extend __builtin_ia32_cmp[p|s][s|d] from avx to > sse/sse2/avx, where its immediate is in range of [0, 7]. > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? Ok. > > BRs, > Lin > > gcc/ChangeLog: >

Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-14 Thread Hongtao Liu
On Fri, Jun 14, 2024 at 6:31 PM Richard Biener wrote: > > The following retires vcond{,u,eq} optabs by stopping to use them > from the middle-end. Targets instead (should) implement vcond_mask > and vec_cmp{,u,eq} optabs. The PR this change refers to lists > possibly affected targets - those imp

Re: [PATCH 0/3] [APX CFCMOV] Support APX CFCMOV

2024-06-16 Thread Hongtao Liu
On Sat, Jun 15, 2024 at 1:22 AM Jeff Law wrote: > > > > On 6/14/24 11:10 AM, Alexander Monakov wrote: > > > > On Fri, 14 Jun 2024, Kong, Lingling wrote: > > > >> APX CFCMOV[1] feature implements conditionally faulting which means that > >> all memory faults are suppressed > >> when the condition

Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-16 Thread Hongtao Liu
On Fri, Jun 14, 2024 at 10:53 PM Hongtao Liu wrote: > > On Fri, Jun 14, 2024 at 6:31 PM Richard Biener wrote: > > > > The following retires vcond{,u,eq} optabs by stopping to use them > > from the middle-end. Targets instead (should) implement vcond_mask > > and

Re: [PATCH] i386: Refine all cvtt* instructions with UNSPEC instead of FIX/UNSIGNED_FIX.

2024-06-16 Thread Hongtao Liu
On Thu, Jun 13, 2024 at 3:13 PM Hu, Lin1 wrote: > > Hi, all > > This patch aims to refine all cvtt* instructions with UNSPEC instead of > FIX/UNSIGNED_FIX. Because the intrinsics should behave as documented. > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? Ok. > > BRs, > Lin >

Re: [PATCH] x86: Emit cvtne2ps2bf16 for odd increasing perm in __builtin_shufflevector

2024-06-16 Thread Hongtao Liu
On Fri, Jun 14, 2024 at 9:35 AM Levy Hsu wrote: > > This patch updates the GCC x86 backend to efficiently handle > odd, incrementally increasing permutations of BF16 vectors > using the cvtne2ps2bf16 instruction. > It modifies ix86_vectorize_vec_perm_const to support these operations > and adds a

Re: [x86 PATCH] Allow all register_operand SUBREGs in x86_ternlog_idx.

2024-06-20 Thread Hongtao Liu
On Wed, Jun 19, 2024 at 5:04 AM Roger Sayle wrote: > > > This patch tweaks ix86_ternlog_idx to allow any SUBREG that matches > the register_operand predicate, and is split out as an independent > piece of a patch that I have to clean-up redundant ternlog patterns > in sse.md. It turns out that so

Re: [PATCH] Add a late-combine pass [PR106594]

2024-06-20 Thread Hongtao Liu
On Wed, Oct 25, 2023 at 2:49 AM Richard Sandiford wrote: > > This patch adds a combine pass that runs late in the pipeline. > There are two instances: one between combine and split1, and one > after postreload. > > The pass currently has a single objective: remove definitions by > substituting int

Re: [PING] [PATCH] AVX-512: Pacify -Wshift-overflow=2. [PR115409]

2024-06-22 Thread Hongtao Liu
On Sat, Jun 22, 2024 at 5:49 AM Collin Funk wrote: > > Hi Hongtao, > > I submitted a patch silencing -Wshift-overflow on a signed int > constant here: > > https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654016.html > > You OK'd it here: > > https://gcc.gnu.org/pipermail/gcc-patches/202

Re: [PATCH] Fix wrong cost of MEM when addr is a lea.

2024-06-26 Thread Hongtao Liu
On Wed, Jun 26, 2024 at 2:52 PM Richard Biener wrote: > > On Wed, Jun 26, 2024 at 8:09 AM liuhongt wrote: > > > > 416.gamess regressed 4-6% on x86_64 since my r15-882-g1d6199e5f8c1c0. > > The commit adjust rtx_cost of mem to reduce cost of (add op0 disp). > > But Cost of ADDR could be cheaper tha

Re: [PATCH] Fix wrong cost of MEM when addr is a lea.

2024-06-26 Thread Hongtao Liu
On Wed, Jun 26, 2024 at 4:02 PM Richard Biener wrote: > > On Wed, Jun 26, 2024 at 9:14 AM Hongtao Liu wrote: > > > > On Wed, Jun 26, 2024 at 2:52 PM Richard Biener > > wrote: > > > > > > On Wed, Jun 26, 2024 at 8:09 AM liuhongt wrote: > > >

Re: [PATCH] x86: Don't enable APX_F in 32-bit mode.

2024-07-22 Thread Hongtao Liu
On Thu, Jul 18, 2024 at 5:29 PM Kong, Lingling wrote: > > I adjusted my patch based on the comments by H.J. > And I will add the testcase like gcc.target/i386/pr101395-1.c when the march > for APX is determined. > > Ok for trunk? Synced with LLVM folks, they agreed to this solution. Ok. > > Than

Re: [PATCH] i386: Adjust rtx cost for imulq and imulw [PR115749]

2024-07-24 Thread Hongtao Liu
On Wed, Jul 24, 2024 at 3:11 PM Kong, Lingling wrote: > > Tested spec2017 performance in Sierra Forest, Icelake, CascadeLake, at least > there is no obvious regression. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > OK for trunk? Ok. > > gcc/ChangeLog: > > * config/i386

Re: [PATCH] [x86]Refine constraint "Bk" to define_special_memory_constraint.

2024-07-25 Thread Hongtao Liu
On Wed, Jul 24, 2024 at 3:57 PM liuhongt wrote: > > For below pattern, RA may still allocate r162 as v/k register, try to > reload for address with leaq __libc_tsd_CTYPE_B@gottpoff(%rip), %rsi > which result a linker error. > > (set (reg:DI 162) > (mem/u/c:DI >(const:DI (unspec:DI >

Re: [PATCH Ping] i386: Use BLKmode for {ld,st}tilecfg

2024-07-25 Thread Hongtao Liu
On Fri, Jul 26, 2024 at 2:28 PM Jiang, Haochen wrote: > > Ping for this patch > > Thx, > Haochen > > > -Original Message- > > From: Haochen Jiang > > Sent: Thursday, July 18, 2024 9:45 AM > > To: gcc-patches@gcc.gnu.org > > Cc: Liu, Hongtao ; hjl.to...@gmail.com; > > ubiz...@gmail.com > >

Re: [PATCH] Fix mismatch between constraint and predicate for ashl3_doubleword.

2024-07-26 Thread Hongtao Liu
On Fri, Jul 26, 2024 at 2:59 PM liuhongt wrote: > > (insn 98 94 387 2 (parallel [ > (set (reg:TI 337 [ _32 ]) > (ashift:TI (reg:TI 329) > (reg:QI 521))) > (clobber (reg:CC 17 flags)) > ]) "test.c":11:13 953 {ashlti3_doubleword} >

Re: [PATCH] [x86]Refine constraint "Bk" to define_special_memory_constraint.

2024-07-28 Thread Hongtao Liu
On Thu, Jul 25, 2024 at 3:23 PM Hongtao Liu wrote: > > On Wed, Jul 24, 2024 at 3:57 PM liuhongt wrote: > > > > For below pattern, RA may still allocate r162 as v/k register, try to > > reload for address with leaq __libc_tsd_CTYPE_B@gottpoff(%rip), %rsi > &g

Re: [PATCH v2] i386: Add non-optimize prefetchi intrins

2024-07-29 Thread Hongtao Liu
On Fri, Jul 26, 2024 at 4:55 PM Haochen Jiang wrote: > > Hi all, > > I added related O0 testcase in this patch. > > Ok for trunk and backport to GCC 14 and GCC 13? Ok. > > Thx, > Haochen > > --- > > Changes in v2: Add testcases. > > --- > > Under -O0, with the "newly" introduced intrins, the varia

Re: [PATCH v2] i386: Add non-optimize prefetchi intrins

2024-07-29 Thread Hongtao Liu
On Tue, Jul 30, 2024 at 9:27 AM Hongtao Liu wrote: > > On Fri, Jul 26, 2024 at 4:55 PM Haochen Jiang wrote: > > > > Hi all, > > > > I added related O0 testcase in this patch. > > > > Ok for trunk and backport to GCC 14 and GCC 13? > Ok. I mean for tru

Re: [PATCH] i386: Remove ndd support for *add_4 [PR113744]

2024-07-30 Thread Hongtao Liu
On Wed, Jul 31, 2024 at 2:08 PM Kong, Lingling wrote: > > *add_4 and *adddi_4 are for shorter opcode from cmp to inc/dec or add > $128. > > But NDD code is longer than the cmp code, so there is no need to support NDD. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ok for tr

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Hongtao Liu
On Wed, Jul 31, 2024 at 1:06 AM Uros Bizjak wrote: > > On Tue, Jul 30, 2024 at 3:00 PM Richard Biener wrote: > > > > On Tue, 30 Jul 2024, Alexander Monakov wrote: > > > > > > > > On Tue, 30 Jul 2024, Richard Biener wrote: > > > > > > > > Oh, and please add a small comment why we don't use XFmode

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Hongtao Liu
On Wed, Jul 31, 2024 at 3:17 PM Uros Bizjak wrote: > > On Wed, Jul 31, 2024 at 9:11 AM Hongtao Liu wrote: > > > > On Wed, Jul 31, 2024 at 1:06 AM Uros Bizjak wrote: > > > > > > On Tue, Jul 30, 2024 at 3:00 PM Richard Biener wrote: > > > > >

Re: [PATCH] i386: Mark target option with optimization when enabled with opt level [PR116065]

2024-07-31 Thread Hongtao Liu
On Tue, Jul 30, 2024 at 1:05 PM Hongyu Wang wrote: > > Richard Biener 于2024年7月26日周五 19:45写道: > > > > On Fri, Jul 26, 2024 at 10:50 AM Hongyu Wang wrote: > > > > > > Hi, > > > > > > When introducing munroll-only-small-loops, the option was marked as > > > Target Save and added to -O2 default whic

Re: [PATCH] i386: Fix memory constraint for APX NF

2024-07-31 Thread Hongtao Liu
On Thu, Aug 1, 2024 at 10:03 AM Kong, Lingling wrote: > > > > > -Original Message- > > From: Liu, Hongtao > > Sent: Thursday, August 1, 2024 9:35 AM > > To: Kong, Lingling ; gcc-patches@gcc.gnu.org > > Cc: Wang, Hongyu > > Subject: RE: [PATCH] i386: Fix memory constraint for APX NF > > >

Re: [PATCH] Fix mismatch between constraint and predicate for ashl3_doubleword.

2024-07-31 Thread Hongtao Liu
On Tue, Jul 30, 2024 at 11:04 AM liuhongt wrote: > > (insn 98 94 387 2 (parallel [ > (set (reg:TI 337 [ _32 ]) > (ashift:TI (reg:TI 329) > (reg:QI 521))) > (clobber (reg:CC 17 flags)) > ]) "test.c":11:13 953 {ashlti3_doubleword} >

Re: [PATCH] x86: Allow TImode offsettable memory only with 8-bit constant

2024-04-14 Thread Hongtao Liu
On Sat, Apr 13, 2024 at 6:42 AM H.J. Lu wrote: > > The x86 instruction size limit is 15 bytes. If a NDD instruction has > a segment prefix byte, a 4-byte opcode prefix, a MODRM byte, a SIB byte, > a 4-byte displacement and a 4-byte immediate, adding an address size > prefix will exceed the size l

Re: [PATCH] i386: Fix behavior for both using AVX10.1-256 in options and function attribute

2024-04-24 Thread Hongtao Liu
On Wed, Apr 24, 2024 at 1:46 PM Haochen Jiang wrote: > > Hi all, > > When we are using -mavx10.1-256 in command line and avx10.1-256 in > target attribute together, zmm should never be generated. But current > GCC will generate zmm since it wrongly enables EVEX512 for non-explicitly > set AVX512.

Re: [PATCH] Don't assert for IFN_COND_{MIN, MAX} in vect_transform_reduction

2024-04-30 Thread Hongtao Liu
On Tue, Apr 30, 2024 at 3:38 PM Jakub Jelinek wrote: > > On Tue, Apr 30, 2024 at 09:30:00AM +0200, Richard Biener wrote: > > On Mon, Apr 29, 2024 at 5:30 PM H.J. Lu wrote: > > > > > > On Mon, Apr 29, 2024 at 6:47 AM liuhongt wrote: > > > > > > > > The Fortran standard does not specify what the r

Re: [PATCH] x86: Fix cmov cost model issue [PR109549]

2024-05-05 Thread Hongtao Liu
CC uros. On Mon, May 6, 2024 at 11:03 AM Kong, Lingling wrote: > > Hi, > (if_then_else:SI (eq (reg:CCZ 17 flags) > (const_int 0 [0])) > (reg/v:SI 101 [ e ]) > (reg:SI 102)) > The cost is 8 for the rtx, the cost for > (eq (reg:CCZ 17 flags) (const_int 0 [0])) is 4, but this is just

Re: [PATCH] i386: fix ix86_hardreg_mov_ok with lra_in_progress

2024-05-07 Thread Hongtao Liu
On Mon, May 6, 2024 at 3:40 PM Kong, Lingling wrote: > > Hi, > Originally eliminate_regs_in_insn will transform > (parallel [ > (set (reg:QI 130) > (plus:QI (subreg:QI (reg:DI 19 frame) 0) > (const_int 96))) > (clobber (reg:CC 17 flag))]) {*addqi_1} > to > (set (reg:QI 130) > (subr

Re: [PATCH] i386: Fix some intrinsics without alignment requirements.

2024-05-08 Thread Hongtao Liu
On Wed, May 8, 2024 at 10:13 AM Hu, Lin1 wrote: > > Hi all, > > This patch aims to fix some intrinsics without alignment requirement, but > raised runtime error's problem. > > Bootstrapped and tested on x86_64-linux-gnu, OK for trunk? Ok. > > BRs, > Lin > > gcc/ChangeLog: > > PR target/845

Re: [x86 PATCH] Improve V[48]QI shifts on AVX512

2024-05-09 Thread Hongtao Liu
On Fri, May 10, 2024 at 6:26 AM Roger Sayle wrote: > > > The following one line patch improves the code generated for V8QI and V4QI > shifts when AV512BW and AVX512VL functionality is available. + /* With AVX512 its cheaper to do vpmovsxbw/op/vpmovwb. */ + && !(TARGET_AVX512BW && TARGET

Re: [x86 PATCH] Improve V[48]QI shifts on AVX512

2024-05-10 Thread Hongtao Liu
, that would also fix this mem operand > issue. I hope to submit it for review this weekend. I opened a PR for that. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021 > > Thanks again, > Roger > > > From: Hongtao Liu > > On Fri, May 10, 2024 at 6:26 AM Roger Sayle &

Re: [PATCH 0/8] i386: Opmitize code with AVX10.2 new instructions

2024-09-01 Thread Hongtao Liu
On Mon, Aug 26, 2024 at 2:43 PM Haochen Jiang wrote: > > Hi all, > > I have just commited AVX10.2 new instructions patches into trunk hours > ago. The next and final part for AVX10.2 upstream is to optimize code > with AVX10.2 new instructions. > > In this patch series, it will contain the followi

Re: [PATCH] i386: Support partial vectorized V2BF/V4BF plus/minus/mult/div/sqrt

2024-09-02 Thread Hongtao Liu
On Mon, Sep 2, 2024 at 4:33 PM Levy Hsu wrote: > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > This patch introduces new mode iterators and expands for the i386 > architecture to support partial vectorization of bf16 operations using > AVX10.2 instructions. Thes

Re: [PATCH] i386: Support partial vectorized V2BF/V4BF smaxmin

2024-09-02 Thread Hongtao Liu
On Mon, Sep 2, 2024 at 4:42 PM Levy Hsu wrote: > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? Ok. > > This patch supports sminmax for partial vectorized V2BF/V4BF. > > gcc/ChangeLog: > > * config/i386/mmx.md (3): New define_expand for > V2BF/V4BFsmaxmin > >

Re: [r15-3359 Regression] FAIL: gcc.target/i386/avx10_2-bf-vector-cmpp-1.c (test for excess errors) on Linux/x86_64

2024-09-02 Thread Hongtao Liu
On Tue, Sep 3, 2024 at 9:45 AM Jiang, Haochen via Gcc-regression wrote: > > As each AVX10.2 testcases previously, this is caused by option combination > warning, > which is expected. > Can we put the warning for mix usage of mavx10 and -mavx512f under -Wpsabi And add -Wno-psabi in addition to -ma

Re: [PATCH] i386: Fix vfpclassph non-optimizied intrin

2024-09-03 Thread Hongtao Liu
On Tue, Sep 3, 2024 at 2:24 PM Haochen Jiang wrote: > > Hi all, > > The intrin for non-optimized got a typo in mask type, which will cause > the high bits of __mmask32 being unexpectedly zeroed. > > The test does not fail under O0 with current 1b since the testcase is > wrong. We need to include a

Re: [PATCH] i386: Support partial vectorized FMA for V2BF/V4BF

2024-09-04 Thread Hongtao Liu
On Wed, Sep 4, 2024 at 11:31 AM Levy Hsu wrote: > > Hi > > Bootstrapped and tested on x86-64-pc-linux-gnu. > Ok for trunk? Ok. > > This patch introduces support for vectorized FMA operations for bf16 types in > V2BF and V4BF modes on the i386 architecture. New mode iterators and > define_expand en

Re: [PATCH] i386: Support partial signbit/xorsign/copysign/abs/neg/and/xor/ior/andn for V2BF/V4BF

2024-09-04 Thread Hongtao Liu
On Wed, Sep 4, 2024 at 10:53 AM Levy Hsu wrote: > > Hi > > This patch adds support for bf16 operations in V2BF and V4BF modes on i386, > handling signbit, xorsign, copysign, abs, neg, and various logical operations. > > Bootstrapped and tested on x86-64-pc-linux-gnu. > Ok for trunk? Ok. > > gcc/Ch

Re: [PATCH] i386: Integrate BFmode for Enhanced Vectorization in ix86_preferred_simd_mode

2024-09-04 Thread Hongtao Liu
On Wed, Sep 4, 2024 at 9:32 AM Levy Hsu wrote: > > Hi > > This change adds BFmode support to the ix86_preferred_simd_mode function > enhancing SIMD vectorization for BF16 operations. The update ensures > optimized usage of SIMD capabilities improving performance and aligning > vector sizes with pr

Re: [PATCH] x86: Refine V4BF/V2BF FMA testcase

2024-09-05 Thread Hongtao Liu
On Fri, Sep 6, 2024 at 10:34 AM Jiang, Haochen wrote: > > > From: Levy Hsu > > Sent: Thursday, September 5, 2024 4:55 PM > > To: gcc-patches@gcc.gnu.org > > > > Simple testcase fix, ok for trunk? > > > > This patch removes specific register checks to account for possible > > register spills and d

Re: [PATCH] x86: Refine V4BF/V2BF FMA Testcase

2024-09-10 Thread Hongtao Liu
On Tue, Sep 10, 2024 at 3:35 PM Levy Hsu wrote: > > Simple testcase fix, ok for trunk? Ok. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: Separated 32-bit > scan > and removed register checks in spill situations. > --- > .../i386/avx10_2-par

Re: [PATCH] i386: Fix incorrect avx512f-mask-type.h include

2024-09-10 Thread Hongtao Liu
On Thu, Sep 5, 2024 at 10:05 AM Haochen Jiang wrote: > > Hi all, > > In avx512f-mask-type.h, we need SIZE being defined to get > MASK_TYPE defined correctly. Fix those testcases where > SIZE are not defined before the include for avv512f-mask-type.h. > > Note that for convert intrins in AVX10.2, t

Re: [RFC PATCH] Enable vectorization for unknown tripcount in very cheap cost model but disable epilog vectorization.

2024-09-11 Thread Hongtao Liu
On Wed, Sep 11, 2024 at 4:04 PM Richard Biener wrote: > > On Wed, Sep 11, 2024 at 4:17 AM liuhongt wrote: > > > > GCC12 enables vectorization for O2 with very cheap cost model which is > > restricted > > to constant tripcount. The vectorization capacity is very limited w/ > > consideration > >

Re: [PATCH v2] Enable V2BF/V4BF vec_cmp with AVX10.2 vcmppbf16

2024-09-11 Thread Hongtao Liu
On Thu, Sep 12, 2024 at 9:55 AM Levy Hsu wrote: > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? Ok. > > gcc/ChangeLog: > > * config/i386/i386.cc (ix86_get_mask_mode): > Enable BFmode for targetm.vectorize.get_mask_mode with AVX10.2. > * config/

Re: [RFC PATCH] Enable vectorization for unknown tripcount in very cheap cost model but disable epilog vectorization.

2024-09-12 Thread Hongtao Liu
On Wed, Sep 11, 2024 at 4:21 PM Hongtao Liu wrote: > > On Wed, Sep 11, 2024 at 4:04 PM Richard Biener > wrote: > > > > On Wed, Sep 11, 2024 at 4:17 AM liuhongt wrote: > > > > > > GCC12 enables vectorization for O2 with very cheap cost model which is >

Re: [PATCH]Several intrinsic macros lack a closing parenthesis[PR93274]

2020-02-18 Thread Hongtao Liu
On Tue, Feb 18, 2020 at 4:24 PM Uros Bizjak wrote: > > > > On Thu, Feb 13, 2020 at 9:39 AM Uros Bizjak wrote: >> >> > Changelog >> > gcc/ >> >* config/i386/avx512vbmi2intrin.h >> >(_mm512_[,mask_,maskz_]shrdi_epi16, >> >_mm512_[,mask_,maskz_]shrdi_epi32, >> >_m512_

Re: [PATCH]Several intrinsic macros lack a closing parenthesis[PR93274]

2020-02-18 Thread Hongtao Liu
On Tue, Feb 18, 2020 at 7:00 PM Hongtao Liu wrote: > > On Tue, Feb 18, 2020 at 4:24 PM Uros Bizjak wrote: > > > > > > > > On Thu, Feb 13, 2020 at 9:39 AM Uros Bizjak wrote: > >> > >> > Changelog > >> > gcc/ > >> >

[PATCH target/92035] Add missing avx512f intrinsics

2019-10-12 Thread Hongtao Liu
Hi: This patch is enabling missing avx512f intrinsics listed as _mm_mask_roundscale_sd _mm_mask_roundscale_round_sd _mm_maskz_roundscale_sd _mm_maskz_roundscale_round_sd _mm_mask_roundscale_ss _mm_mask_roundscale_round_ss _mm_maskz_roundscale_ss _mm_maskz_roundscale_round_ss Bootstrap ok, reg

Re: [PATCH target/92035] Add missing avx512f intrinsics

2019-10-12 Thread Hongtao Liu
On Sat, Oct 12, 2019 at 4:15 PM Jakub Jelinek wrote: > > Hi! > > > gcc/ > > * config/i386/avx512fintrin.h (_mm_mask_roundscale_ss, > > _mm_maskz_roundscale_ss, _mm_maskz_roundscale_round_ss, > > _mm_maskz_roundscale_round_ss, _mm_mask_roundscale_sd, > > _mm_maskz_roundscale

Re: [wwwdocs] Update gcc-10/changes.html re Intel ISA (was: gcc-wwwdocs branch master updated. 63fbcfeaf27d9dd2083ccbd34bdff8fccb63949c)

2019-10-20 Thread Hongtao Liu
On Mon, Oct 21, 2019 at 1:15 AM Gerald Pfeifer wrote: > > On Fri, 11 Oct 2019, liuho...@gcc.gnu.org wrote: > > commit 63fbcfeaf27d9dd2083ccbd34bdff8fccb63949c > > Author: liuhongt > > Date: Fri Oct 11 14:27:47 2019 +0800 > > > > Update gcc10 changes with new intel ISA. > > I just applied th

Re: [PATCH] Split X86_TUNE_AVX128_OPTIMAL into X86_TUNE_AVX256_SPLIT_REGS and X86_TUNE_AVX128_OPTIMAL

2019-11-17 Thread Hongtao Liu
On Sat, Nov 16, 2019 at 7:27 AM Jeff Law wrote: > > On 11/14/19 5:21 AM, Richard Biener wrote: > > On Tue, Nov 12, 2019 at 11:35 AM Hongtao Liu wrote: > >> > >> Hi: > >> As mentioned in https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00832.html > >

[PATCH] Fix TYPO of avx512f_maskcmp3.

2019-11-26 Thread Hongtao Liu
hi jakub: VF is used for differentiating AVX512F/AVX/SSE, but there's condition TARGET_AVX512F in avx512f_maskcmp3, it must be a TYPO and should be VF_AVX512VL instead. Bootstrap and regression test on i386/x86_64 backend is ok. OK for trunk? diff --git a/gcc/config/i386/sse.md b/gcc/config/i3

[PATCH] Enable mask operation for 128/256-bit vector VCOND_EXPR under avx512f (PR92686)

2019-12-03 Thread Hongtao Liu
Hi: Currently for VCOND_EXPR, integer mask operation is only available for 512-bit vector, but since mask register is related to isa not vector size, under avx512f we can also have 128/256-bit vector condition move. My local tests show there's no boost frequency penalty for using integer mask reg

Add GCC support to ENQCMD.

2019-05-23 Thread Hongtao Liu
Hi Uros and all: This patch is about to enable support for ENQCMD(Enqueue Command) which will be in Willow Cove. There are two instructions for ENQCMD: ENQCMD and ENQCMDS. More details please refer to https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-exte

Re: Add GCC support to ENQCMD.

2019-05-27 Thread Hongtao Liu
On Fri, May 24, 2019 at 3:51 PM Uros Bizjak wrote: > > On Fri, May 24, 2019 at 9:43 AM Uros Bizjak wrote: > > > > On Fri, May 24, 2019 at 7:16 AM Hongtao Liu wrote: > > > > > > Hi Uros and all: > > > This patch is about to enable support for E

Re: [Patch] Fix ix86_expand_sse_comi_round (PR Target/89750, PR Target/86444)

2019-05-30 Thread Hongtao Liu
On Thu, May 30, 2019 at 3:23 AM Jeff Law wrote: > > On 5/9/19 10:54 PM, Hongtao Liu wrote: > > On Fri, May 10, 2019 at 3:55 AM Jeff Law wrote: > >> > >> On 5/6/19 11:38 PM, Hongtao Liu wrote: > >>> Hi Uros and GCC: > >>> This patch is

Re: [Patch] Fix ix86_expand_sse_comi_round (PR Target/89750, PR Target/86444)

2019-06-02 Thread Hongtao Liu
On Sat, Jun 1, 2019 at 6:08 AM Jeff Law wrote: > > On 5/30/19 2:53 AM, Hongtao Liu wrote: > > On Thu, May 30, 2019 at 3:23 AM Jeff Law wrote: > >> On 5/9/19 10:54 PM, Hongtao Liu wrote: > >>> On Fri, May 10, 2019 at 3:55 AM Jeff Law wrote: > >&g

[PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/89803)

2019-06-03 Thread Hongtao Liu
Hi Jeff: The following patch adds forgotten avx512f fpclass instrinsics for masked scalar operations. Bootstrapped/regtested on x86_64-linux and i686-linux (on skylake-avx512), ok for trunk? Changelog: gcc/ +2019-03-24 Hongtao Liu + + PR target/89803 + * config/i386/avx512dqintrin.h

Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/89803)

2019-06-04 Thread Hongtao Liu
On Mon, Jun 3, 2019 at 7:06 PM Jakub Jelinek wrote: > > On Mon, Jun 03, 2019 at 06:01:40PM +0800, Hongtao Liu wrote: > > The following patch adds forgotten avx512f fpclass instrinsics for > > masked scalar operations. > > > > Bootstrapped/regtested on x86_64-li

Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/89803)

2019-06-04 Thread Hongtao Liu
On Tue, Jun 4, 2019 at 3:59 PM Jakub Jelinek wrote: > > On Tue, Jun 04, 2019 at 03:38:08PM +0800, Hongtao Liu wrote: > > --- gcc/ChangeLog (revision 271853) > > +++ gcc/ChangeLog (working copy) > > @@ -4706,6 +4706,26 @@ > > reprocessing. Always ca

Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/89803)

2019-06-04 Thread Hongtao Liu
On Tue, Jun 4, 2019 at 5:21 PM Jakub Jelinek wrote: > > On Tue, Jun 04, 2019 at 05:00:05PM +0800, Hongtao Liu wrote: > > Thanks for reminding, Here is updated: > > You've missed some notes. Ok for trunk with: > 1) the following patch applied on top of your patch > 2

Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/89803)

2019-06-04 Thread Hongtao Liu
On Tue, Jun 4, 2019 at 5:56 PM Hongtao Liu wrote: > > On Tue, Jun 4, 2019 at 5:21 PM Jakub Jelinek wrote: > > > > On Tue, Jun 04, 2019 at 05:00:05PM +0800, Hongtao Liu wrote: > > > Thanks for reminding, Here is updated: > > > > You've missed some not

[PATCH] Enable memory operand for vfpclass[p,s][s,d] patterns.

2019-06-05 Thread Hongtao Liu
ed on x86_64-linux and i686-linux (on skylake-avx512), ok for trunk? Changelog gcc/ 2019-06-05 Hongtao Liu * config/i386/sse.md (define_mode_suffix vecmemsuffix): New. (define_insn "avx512dq_fpclass"): Enable memory operand for it. (define_insn "avx512dq_vmfpclass"): Ditto.

Re: [PATCH] Enable memory operand for vfpclass[p,s][s,d] patterns.

2019-06-05 Thread Hongtao Liu
On Thu, Jun 6, 2019 at 6:18 AM Jeff Law wrote: > > On 6/5/19 1:39 AM, Hongtao Liu wrote: > > Hi Jeff and Jakub: > > When adding new intrinsics(PR target/89803), i found vfpclassp[sd], > > vfpclasss[sd] patterns didn't support memory operand which is > > suppo

[PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-05 Thread Hongtao Liu
-instruction-set-extensions-programming-reference.pdf Bootstrap is ok, and no regressions for i386/x86-64 testsuite. Changelog: gcc/ +2019-06-06 Hongtao Liu + H.J. Lu + Olga Makhotina + + * common/config/i386/i386-common.c + (OPTION_MASK_ISA_AVX512VP2INTERSECT_SET

Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-19 Thread Hongtao Liu
On Sat, Jun 8, 2019 at 4:12 AM Uros Bizjak wrote: > > On 6/7/19, H.J. Lu wrote: > > >> > > +/* Register pair. */ > >> > > +VECTOR_MODES_WITH_PREFIX (P, INT, 2); /* P2QI */ > >> > > +VECTOR_MODES_WITH_PREFIX (P, INT, 4); /* P2HI P4QI */ > >> > > > >> > > I think > >> > > > >> > > INT_MODE (P2QI,

Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-20 Thread Hongtao Liu
On Thu, Jun 20, 2019 at 2:13 PM Uros Bizjak wrote: > > On Thu, Jun 20, 2019 at 7:36 AM Hongtao Liu wrote: > > > > On Sat, Jun 8, 2019 at 4:12 AM Uros Bizjak wrote: > > > > > > On 6/7/19, H.J. Lu wrote: > > > > > > >> > > +/* Re

Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-20 Thread Hongtao Liu
On Thu, Jun 20, 2019 at 10:58 PM H.J. Lu wrote: > > On Thu, Jun 20, 2019 at 3:54 AM Hongtao Liu wrote: > > > > On Thu, Jun 20, 2019 at 2:13 PM Uros Bizjak wrote: > > > > > > On Thu, Jun 20, 2019 at 7:36 AM Hongtao Liu wrote: > > > > > >

Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-20 Thread Hongtao Liu
On Thu, Jun 20, 2019 at 7:37 PM Uros Bizjak wrote: > > On Thu, Jun 20, 2019 at 12:54 PM Hongtao Liu wrote: > > > > On Thu, Jun 20, 2019 at 2:13 PM Uros Bizjak wrote: > > > > > > On Thu, Jun 20, 2019 at 7:36 AM Hongtao Liu wrote: > > > > > >

Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-20 Thread Hongtao Liu
On Fri, Jun 21, 2019 at 1:56 PM Uros Bizjak wrote: > > On Fri, Jun 21, 2019 at 4:21 AM Hongtao Liu wrote: > > > > On Thu, Jun 20, 2019 at 10:58 PM H.J. Lu wrote: > > > > > > On Thu, Jun 20, 2019 at 3:54 AM Hongtao Liu wrote: > > > > > >

Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-08-06 Thread Hongtao Liu
his or attach the patch instead. > > >> >> > > >> >> > Index: ChangeLog > > >> >> > === > > >> >> > --- ChangeLog (revision 272668) > > >> >>

Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

2019-08-29 Thread Hongtao Liu
On Fri, Aug 30, 2019 at 2:09 AM Uros Bizjak wrote: > > 2019-08-28 Uroš Bizjak > > * config/i386/i386.c (ix86_register_move_cost): Do not > limit the cost of moves to/from XMM register to minimum 8. > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > Actually committe

Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

2019-08-29 Thread Hongtao Liu
On Fri, Aug 30, 2019 at 8:10 AM Hongtao Liu wrote: > > On Fri, Aug 30, 2019 at 2:09 AM Uros Bizjak wrote: > > > > 2019-08-28 Uroš Bizjak > > > > * config/i386/i386.c (ix86_register_move_cost): Do not > > limit the cost of moves to/from XMM registe

Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

2019-08-30 Thread Hongtao Liu
On Fri, Aug 30, 2019 at 2:18 PM Uros Bizjak wrote: > > On Fri, Aug 30, 2019 at 2:08 AM Hongtao Liu wrote: > > > > On Fri, Aug 30, 2019 at 2:09 AM Uros Bizjak wrote: > > > > > > 2019-08-28 Uroš Bizjak > > > > > > * config/i386/i386.c

Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

2019-09-02 Thread Hongtao Liu
> which is not the case with core_cost (and similar with skylake_cost): > > 2, 2, 4,/* cost of moving XMM,YMM,ZMM register */ > {6, 6, 6, 6, 12},/* cost of loading SSE registers >in 32,64,128,256 and 512-bit */ > {6, 6, 6, 6, 12},

Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

2019-09-02 Thread Hongtao Liu
On Mon, Sep 2, 2019 at 6:23 PM Richard Biener wrote: > > On Mon, Sep 2, 2019 at 10:13 AM Hongtao Liu wrote: > > > > > which is not the case with core_cost (and similar with skylake_cost): > > > > > > 2, 2, 4,/* cost of moving XMM,YMM,

Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

2019-09-03 Thread Hongtao Liu
On Mon, Sep 2, 2019 at 4:41 PM Uros Bizjak wrote: > > On Mon, Sep 2, 2019 at 10:13 AM Hongtao Liu wrote: > > > > > which is not the case with core_cost (and similar with skylake_cost): > > > > > > 2, 2, 4,/* cost of moving XMM,YMM,

Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

2019-09-03 Thread Hongtao Liu
On Wed, Sep 4, 2019 at 12:50 AM Uros Bizjak wrote: > > On Tue, Sep 3, 2019 at 1:33 PM Richard Biener > wrote: > > > > > Note: > > > > Removing limit of cost would introduce lots of regressions in SPEC2017 > > > > as follow > > > > > > > > 531.deepsjeng_r -7.18%

Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

2019-09-04 Thread Hongtao Liu
On Wed, Sep 4, 2019 at 9:44 AM Hongtao Liu wrote: > > On Wed, Sep 4, 2019 at 12:50 AM Uros Bizjak wrote: > > > > On Tue, Sep 3, 2019 at 1:33 PM Richard Biener > > wrote: > > > > > > > Note: > > > > > Removing limit of cost would in

[PATCH target/87007]Extend rpad to handle AVX512F vcvtusi2ss/vcvtusi2sd

2019-09-17 Thread Hongtao Liu
Hi Uros: This patch extend pass rpad to handle AVX512F vcvtusi2ss/vcvtusi2sd. 538.image_r would be improved by 4% with single copy run on skylake workstation. Bootstrap ok. regression test for i386/x86 backend ok. Ok for trunk? Changelog gcc/ * config/i386/i386.md (*floatuns2_avx512)

Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-24 Thread Hongtao Liu
to invent something like SPECIAL_INT_MODE, which would > avoid mode promotion functionality (basically, it should not be listed > in mode_wider and similar arrays). This would prevent mode promotion > issues, while it would still allow to have mode, having the same width > as existing mode, but with special properties. > > I'

Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-25 Thread Hongtao Liu
On Wed, Jun 26, 2019 at 1:13 AM Uros Bizjak wrote: > > On Tue, Jun 25, 2019 at 4:44 AM Hongtao Liu wrote: > > > > On Sat, Jun 22, 2019 at 3:38 PM Uros Bizjak wrote: > > > > > > On Fri, Jun 21, 2019 at 8:38 PM H.J. Lu wrote: > > > >

Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-26 Thread Hongtao Liu
On Wed, Jun 26, 2019 at 5:21 PM Martin Liška wrote: > > Hi. > > Started from r272668 I see: > > /tmp/ccqxwVjt.s: Assembler messages: > > /tmp/ccqxwVjt.s:22: Error: no such instruction: `vp2intersectq > .LC1(%rip),%zmm0,%k0' > > /tmp/ccqxwVjt.s:33: Error: no such instruction: `vp2intersectd > .LC

Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-26 Thread Hongtao Liu
_avx512ifma { } { > > return [check_no_compiler_messages avx512ifma object { > > as usual, the new effective-target keyword needs documenting in > sourcebuild.texi. Like this? Index: ChangeLog === --- ChangeLog (revis

Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-27 Thread Hongtao Liu
; single space. Please fix this or attach the patch instead. > > > Index: ChangeLog > > === > > --- ChangeLog (revision 272668) > > +++ ChangeLog (working copy) > > @@ -1,3 +1,8 @@ > > +2019-06-27

Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-27 Thread Hongtao Liu
gt; > ======= > >> > --- ChangeLog (revision 272668) > >> > +++ ChangeLog (working copy) > >> > @@ -1,3 +1,8 @@ > >> > +2019-06-27 Hongtao Liu > >> > + > >> >

[PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/897803)

2019-03-24 Thread Hongtao Liu
269894) +++ ChangeLog (working copy) @@ -1,3 +1,16 @@ +2019-03-24 Hongtao Liu + + PR target/89803 + * config/i386/avx512dqintrin.h + (_mm_mask_fpclass_ss_mask,_mm_mask_fpclass_sd_mask): + New intrinsics. + * config/i386/i386-builtin.def + (__builtin_ia32_fpcla_mask

Ping Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/897803)

2019-03-27 Thread Hongtao Liu
Hi Uros: would you help to review this patch? Regards, Hongtao. On Sun, Mar 24, 2019 at 8:13 PM Hongtao Liu wrote: > > Hi: > The following patch adds forgotten avx512f fpclass instrinsics for > masked scalar operations. > > Bootstrapped/regtested on x86_64-linux and i686

Re: Ping Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/89803)

2019-03-29 Thread Hongtao Liu
On Sat, Mar 30, 2019 at 5:34 AM Jeff Law wrote: > > On 3/28/19 1:38 AM, Uros Bizjak wrote: > > On Thu, Mar 28, 2019 at 7:47 AM Hongtao Liu wrote: > >> > >> Hi Uros: > >> would you help to review this patch? > > > > This is AVX512F patch, you w

Re: Enable BF16 support (Please ignore my former email)

2019-04-12 Thread Hongtao Liu
On Fri, Apr 12, 2019 at 3:30 PM Uros Bizjak wrote: > > On Fri, Apr 12, 2019 at 9:09 AM Liu, Hongtao wrote: > > > > Hi : > > This patch is about to enable support for bfloat16 which will be in > > Future Cooper Lake, Please refer to > > https://software.intel.com/en-us/download/intel-archite

Re: [PATCH] Add support for missing AVX512* ISAs (PR target/89929).

2019-04-17 Thread Hongtao Liu
On Tue, Apr 16, 2019 at 11:41 PM H.J. Lu wrote: > > On Tue, Apr 16, 2019 at 8:36 AM Martin Liška wrote: > > > > On 4/16/19 4:50 PM, H.J. Lu wrote: > > > On Tue, Apr 16, 2019 at 1:28 AM Martin Liška wrote: > > >> > > >> On 4/15/19 5:09 PM, H.J. Lu wrote: > > >>> On Mon, Apr 15, 2019 at 12:26 AM M

<    1   2   3   4   5   6   7   8   9   10   >