Re: [PATCH] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-20 Thread Hongtao Liu via Gcc-patches
m_mask_move_ss (__m128 src, __mmask8 k, __m128 a, __m128 b) https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=vmovss&ig_expand=3807,3081,3082,3084,3083,4837,4838 > > LLVM generates mask & 1 for these intrinsics. > > Hongtao Liu via Gcc-patches 于20

Re: [PATCH v2] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-21 Thread Hongtao Liu via Gcc-patches
On Mon, Mar 21, 2022 at 7:52 PM Hongyu Wang via Gcc-patches wrote: > > Hi, > > For complex scalar intrinsic like _mm_mask_fcmadd_sch, the > mask should be and by 1 to ensure the mask is bind to lowest byte. > Use masked vmovss to perform same operation which omits higher bits > of mask. > > Bootst

Re: [PATCH] [i386] Extend splitter pattern to reversed condition by swapping then and else rtx. [PR target/104982]

2022-03-21 Thread Hongtao Liu via Gcc-patches
On Mon, Mar 21, 2022 at 9:06 PM liuhongt wrote: > > Failed to match this instruction: > (set (reg/v:SI 88 [ z ]) > (if_then_else:SI (eq (zero_extract:SI (reg:SI 92) > (const_int 1 [0x1]) > (zero_extend:SI (subreg:QI (reg:SI 93) 0))) > (const_int 0 [0

Re: [PATCH] Fix ICE caused by NULL_RTX returned by lowpart_subreg.

2022-03-22 Thread Hongtao Liu via Gcc-patches
On Wed, Mar 23, 2022 at 2:05 PM liuhongt via Gcc-patches wrote: > > In validate_subreg, both (subreg:V2HF (reg:SI) 0) > and (subreg:V8HF (reg:V2HF) 0) are valid, but not > for (subreg:V8HF (reg:SI) 0) which causes ICE. > > Ideally it should be handled in validate_subreg to support > subreg for all

Re: [PATCH][RFC] tree-optimization/101908 - avoid STLF fails when vectorizing

2022-03-25 Thread Hongtao Liu via Gcc-patches
On Fri, Mar 25, 2022 at 8:11 PM Richard Biener via Gcc-patches wrote: > > Since we're now vectorizing by default at -O2 issues like PR101908 > become more important where we apply basic-block vectorization to > parts of the function covering loads from function parameters passed > on the stack. S

Re: [PATCH][RFC] tree-optimization/101908 - avoid STLF fails when vectorizing

2022-03-25 Thread Hongtao Liu via Gcc-patches
On Fri, Mar 25, 2022 at 9:42 PM Richard Biener wrote: > > On Fri, 25 Mar 2022, Hongtao Liu wrote: > > > On Fri, Mar 25, 2022 at 8:11 PM Richard Biener via Gcc-patches > > wrote: > > > > > > Since we're now vectorizing by default at -O2 issues like P

Re: [PATCH] x86: Use x constraint on KL patterns

2022-03-25 Thread Hongtao Liu via Gcc-patches
On Sat, Mar 26, 2022 at 4:50 AM H.J. Lu via Gcc-patches wrote: > > Since KL instructions have no AVX512 version, replace the "v" register > constraint with the "x" register constraint. > > PR target/105058 > * config/i386/sse.md (loadiwkey): Replace "v" with "x". > (aesu8):

Re: [PATCH] x86: Use x constraint on SSSE3 patterns with MMX operands

2022-03-25 Thread Hongtao Liu via Gcc-patches
On Sat, Mar 26, 2022 at 1:27 AM H.J. Lu via Gcc-patches wrote: > > Since PHADDW/PHADDD/PHADDSW/PHSUBW/PHSUBD/PHSUBSW/PSIGNB/PSIGNW/PSIGND > have no AVX512 version, replace the "Yv" register constraint with the > "x" register constraint. LGTM, please backport to GCC10/GCC11 branch. > > PR t

Re: [PATCH] x86: Use x constraint on KL patterns

2022-03-25 Thread Hongtao Liu via Gcc-patches
On Sat, Mar 26, 2022 at 10:05 AM Hongyu Wang via Gcc-patches wrote: > > > > Is it possible to create a test case that gas would throw an error for > > > invalid operands? > > > > You can use -ffix-xmmN to disable XMM0-15. > > I mean can we create an intrinsic test for this PR that produces xmm16-3

Re: [PATCH] Split vector load from parm_del to elemental loads to avoid STLF stalls.

2022-03-31 Thread Hongtao Liu via Gcc-patches
On Thu, Mar 31, 2022 at 6:45 PM Richard Biener via Gcc-patches wrote: > > On Thu, Mar 31, 2022 at 7:51 AM liuhongt wrote: > > > > Since cfg is freed before machine_reorg, just do a rough calculation > > of the window according to the layout. > > Also according to an experiment on CLX, set window

Re: [PATCH] Split vector load from parm_del to elemental loads to avoid STLF stalls.

2022-04-01 Thread Hongtao Liu via Gcc-patches
On Fri, Apr 1, 2022 at 2:54 PM Richard Biener via Gcc-patches wrote: > > On Fri, Apr 1, 2022 at 8:47 AM liuhongt via Gcc-patches > wrote: > > > > Update in V2: > > 1. Use get_insns instead of FOR_EACH_BB_CFUN and FOR_BB_INSNS. > > 2. Return for any_uncondjump_p and ANY_RETURN_P. > > 3. Add dump i

Re: [PATCH V3] Split vector load from parm_del to elemental loads to avoid STLF stalls.

2022-04-04 Thread Hongtao Liu via Gcc-patches
On Fri, Apr 1, 2022 at 4:32 PM liuhongt via Gcc-patches wrote: > > Update in V3: > 1. Add -param=x86-stlf-window-ninsns= (default 64). > 2. Exclude call in the window. > > Since cfg is freed before machine_reorg, just do a rough calculation > of the window according to the layout. > Also according

Re: [x86_64 PATCH] Support pandn for V1TI mode (i.e. *andnotv1ti3).

2022-04-05 Thread Hongtao Liu via Gcc-patches
On Wed, Apr 6, 2022 at 5:56 AM Roger Sayle wrote: > > > > This simple patch allows the i386 backend to generate pandn instructions > > for V1TI mode. Currently, the testcase: > > > > typedef unsigned __int128 v1ti __attribute__ ((__vector_size__ (16))); > > v1ti andnot1(v1ti x, v1ti y) { return ~

Re: [PATCH] AVX512F: Add missing macro for mask(z?)_scalf_s[sd] [PR 105339]

2022-04-22 Thread Hongtao Liu via Gcc-patches
On Fri, Apr 22, 2022 at 4:12 PM Hongyu Wang via Gcc-patches wrote: > > Hi, > > Add missing macro under O0 and adjust macro format for scalf > intrinsics. > Please add the corresponding intrinsic test in sse-14.c. > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}. > > Ok for master and backpor

Re: [PATCH] i386: Fix GLC tuning with -masm=intel [PR104104]

2022-01-18 Thread Hongtao Liu via Gcc-patches
On Wed, Jan 19, 2022 at 8:00 AM Jakub Jelinek wrote: > > On Sun, Jan 16, 2022 at 12:22:18PM +0800, Hongtao Liu via Gcc-patches wrote: > > On Sun, Jan 16, 2022 at 12:44 AM Uros Bizjak via Gcc-patches > > wrote: > > > > > > On Sat, Jan 15, 2022

Re: [PATCH] i386: Fix *aesu8

2022-01-18 Thread Hongtao Liu via Gcc-patches
On Wed, Jan 19, 2022 at 9:40 AM Jakub Jelinek wrote: > > Hi! > > On Wed, Jan 19, 2022 at 09:09:41AM +0800, Hongtao Liu wrote: > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > Yes, thanks. > > Thanks. Committed. > grep '{[^|}

Re: [PATCH v2] x86: Also check mode of memory broadcast in bcst_mem_operand

2022-01-23 Thread Hongtao Liu via Gcc-patches
On Sun, Jan 23, 2022 at 8:28 PM H.J. Lu via Gcc-patches wrote: > > Return false for invalid mode on memory broadcast in bcst_mem_operand: > > (vec_duplicate:V16SF (mem/j:V4SF (reg/v/f:DI 109 [ b ]))) > Yes, thanks. > gcc/ > > PR target/104188 > * config/i386/predicates.md (bcst_mem

Re: [PATCH v4] x86: Add -m[no-]direct-extern-access

2022-02-08 Thread Hongtao Liu via Gcc-patches
On Fri, Jan 28, 2022 at 5:53 AM H.J. Lu via Gcc-patches wrote: > > The v3 patch was posted at > > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574847.html > > There is no progress with repeated pings since then. Glibc 2.35 and > binutils 2.38 will support GNU_PROPERTY_1_NEEDED_INDIRECT_EXT

Re: [PATCH] x86: Check each component of source operand for AVX_U128_DIRTY

2022-02-08 Thread Hongtao Liu via Gcc-patches
On Wed, Feb 9, 2022 at 10:53 AM H.J. Lu via Gcc-patches wrote: > > commit 9775e465c1fbfc32656de77c618c61acf5bd905d > Author: H.J. Lu > Date: Tue Jul 27 07:46:04 2021 -0700 > > x86: Don't set AVX_U128_DIRTY when zeroing YMM/ZMM register > > called ix86_check_avx_upper_register to check mode

Re: [PATCH] x86: Update PR 35513 tests

2022-02-11 Thread Hongtao Liu via Gcc-patches
On Thu, Feb 10, 2022 at 9:58 PM H.J. Lu via Gcc-patches wrote: > > 1. Require linker with GNU_PROPERTY_1_NEEDED support for PR 35513 > run-time tests. > 2. Compile pr35513-8.c to scan assembly code. > > PR testsuite/104481 > * g++.target/i386/pr35513-1.C: Require property_1_needed

Re: [GCC 11 PATCH 1/5] x86: Remove "%!" before ret

2022-02-15 Thread Hongtao Liu via Gcc-patches
On Tue, Feb 1, 2022 at 2:56 AM H.J. Lu via Gcc-patches wrote: > > Before MPX was removed, "%!" was mapped to > > case '!': > if (ix86_bnd_prefixed_insn_p (current_output_insn)) > fputs ("bnd ", file); > return; > > After CET was added and MPX was removed, "%

Re: [GCC 11 PATCH 0/5] x86: Backport straight-line-speculation mitigation

2022-02-15 Thread Hongtao Liu via Gcc-patches
On Tue, Feb 1, 2022 at 2:55 AM H.J. Lu via Gcc-patches wrote: > > Backport -mindirect-branch-cs-prefix: > > commit 48a4ae26c225eb018ecb59f131e2c4fd4f3cf89a > Author: H.J. Lu > Date: Wed Oct 27 06:27:15 2021 -0700 > > x86: Add -mindirect-branch-cs-prefix > > Add -mindirect-branch-cs-pref

Re: [PATCH] Restrict the two sources of vect_recog_cond_expr_convert_pattern to be of the same type when convert is extension.

2022-02-16 Thread Hongtao Liu via Gcc-patches
On Wed, Feb 16, 2022 at 10:17 PM Jakub Jelinek via Gcc-patches wrote: > > On Wed, Feb 16, 2022 at 05:03:09PM +0800, liuhongt via Gcc-patches wrote: > > > > +(match (cond_expr_convert_p @0 @2 @3 @6) > > > > + (cond (simple_comparison@6 @0 @1) (convert@4 @2) (convert@5 @3)) > > > > + (if (types_mat

Re: [PATCH] [i386] Clean up MPX-related bit_{MPX,BNDREGS,BNDCSR}.

2022-02-16 Thread Hongtao Liu via Gcc-patches
On Thu, Feb 17, 2022 at 12:00 PM liuhongt wrote: > > Bootstrap and regrestest on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > > * config/i386/cpuid.h (bit_MPX): Removed. > (bit_BNDREGS): Ditto. > (bit_BNDCSR): Ditto. > --- > gcc/config/i386/cpuid.h | 5

Re: [PATCH] x86: Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER

2022-02-16 Thread Hongtao Liu via Gcc-patches
On Thu, Feb 17, 2022 at 12:26 PM H.J. Lu via Gcc-patches wrote: > > Reading YMM registers with all zero bits needs VZEROUPPER on Sandy Bride, > Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid SSE <-> AVX > transition penalty. Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER to > generate vzero

Re: [PATCH] target/104581 - compile-time regression in mode-switching

2022-02-17 Thread Hongtao Liu via Gcc-patches
On Thu, Feb 17, 2022 at 9:47 PM Richard Biener via Gcc-patches wrote: > > The x86 backend piggy-backs on mode-switching for insertion of > vzeroupper. A recent improvement there was implemented in a way > to walk possibly the whole basic-block for all DF reg def definitions > in its mode_needed h

Re: [PATCH 3/3] target/99881 - x86 vector cost of CTOR from integer regs

2022-02-20 Thread Hongtao Liu via Gcc-patches
On Fri, Feb 18, 2022 at 10:01 PM Richard Biener via Gcc-patches wrote: > > This uses the now passed SLP node to the vectorizer costing hook > to adjust vector construction costs for the cost of moving an > integer component from a GPR to a vector register when that's > required for building a vect

Re: [PATCH v2] x86: Add TARGET_OMIT_VZEROUPPER_AFTER_AVX_READ_ZERO

2022-02-20 Thread Hongtao Liu via Gcc-patches
On Thu, Feb 17, 2022 at 9:56 PM H.J. Lu wrote: > > On Thu, Feb 17, 2022 at 08:51:31AM +0100, Uros Bizjak wrote: > > On Thu, Feb 17, 2022 at 6:25 AM Hongtao Liu via Gcc-patches > > wrote: > > > > > > On Thu, Feb 17, 2022 at 12:26 PM H.J. Lu via Gcc-patches &

Re: [PATCH v2] x86: Add TARGET_OMIT_VZEROUPPER_AFTER_AVX_READ_ZERO

2022-02-21 Thread Hongtao Liu via Gcc-patches
On Tue, Feb 22, 2022 at 2:35 AM H.J. Lu wrote: > > On Sun, Feb 20, 2022 at 6:01 PM Hongtao Liu wrote: > > > > On Thu, Feb 17, 2022 at 9:56 PM H.J. Lu wrote: > > > > > > On Thu, Feb 17, 2022 at 08:51:31AM +0100, Uros Bizjak wrote: > > > > On

Re: [PATCH] i386: Fix up copysign/xorsign expansion [PR104612]

2022-02-21 Thread Hongtao Liu via Gcc-patches
On Tue, Feb 22, 2022 at 12:46 AM Jakub Jelinek wrote: > > Hi! > > We ICE on the following testcase for -m32 since r12-3435. because > operands[2] is (subreg:SF (reg:DI ...) 0) and According to validate_subreg, (subreg:V4SF (reg:DI ...) 0) should be valid(but not sure if it really works ) For -m64

Re: [PATCH 3/3] target/99881 - x86 vector cost of CTOR from integer regs

2022-02-21 Thread Hongtao Liu via Gcc-patches
On Mon, Feb 21, 2022 at 5:10 PM Richard Biener wrote: > > On Mon, 21 Feb 2022, Hongtao Liu wrote: > > > On Fri, Feb 18, 2022 at 10:01 PM Richard Biener via Gcc-patches > > wrote: > > > > > > This uses the now passed SLP node to the vectorizer costing h

Re: [PATCH] [i386] Fix typo in v1ti3.

2022-02-23 Thread Hongtao Liu via Gcc-patches
On Wed, Feb 23, 2022 at 5:48 PM Jakub Jelinek via Gcc-patches wrote: > > On Wed, Feb 23, 2022 at 05:21:26PM +0800, liuhongt via Gcc-patches wrote: > > For evex encoding vp{xor,or,and}, suffix is needed. > > > > Or there would be an error for > > vpxor %ymm0, %ymm31, %ymm1 > > The insn is about V1T

Re: [PATCH] [i386] Don't fold builtin into gimple when isa mismatches.

2022-02-24 Thread Hongtao Liu via Gcc-patches
On Fri, Feb 25, 2022 at 1:50 PM liuhongt wrote: > > The patch fixes ICE in ix86_gimple_fold_builtin. > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. Ok for main trunk? > gcc/ChangeLog: > > PR target/104666 > * config/i386/i386-expand.cc > (ix86_check_builtin_isa_m

Re: [PATCH] AVX512F: Add helper enumeration for ternary logic intrinsics.

2022-02-27 Thread Hongtao Liu via Gcc-patches
On Fri, Feb 25, 2022 at 4:44 PM Hongyu Wang via Gcc-patches wrote: > > Hi, > > This patch intends to sync with llvm change in > https://reviews.llvm.org/D120307 to add enumeration and truncate This will be documented in intel intrinsic guide. > imm to unsigned char, so users could use ~ on immedia

Re: [PATCH] i386: add alias for f*mul_*ch intrinsics

2021-11-17 Thread Hongtao Liu via Gcc-patches
On Tue, Nov 16, 2021 at 4:35 PM Hongtao Liu wrote: > > On Tue, Nov 16, 2021 at 4:23 PM Kong, Lingling via Gcc-patches > wrote: > > > > Hi, > > > > This patch is to add alias for f*mul_*ch intrinsics. > > > > Ok for master? > This patch j

Re: [PATCH] Don't allow mask/sse/mmx mov in TLS code sequences.

2021-11-21 Thread Hongtao Liu via Gcc-patches
On Fri, Nov 19, 2021 at 3:53 PM Uros Bizjak via Gcc-patches wrote: > > On Fri, Nov 19, 2021 at 8:50 AM Uros Bizjak wrote: > > > > On Fri, Nov 19, 2021 at 2:14 AM liuhongt wrote: > > > > > > >Why is the above declared as a special memory constraint? Also the > > > Change to define_memory_constrai

Re: [PATCH] rs6000/test: Add emulated gather test case

2021-11-24 Thread Hongtao Liu via Gcc-patches
On Thu, Nov 25, 2021 at 11:21 AM Kewen.Lin via Gcc-patches wrote: > > Hi, > > This patch is to add a test case similar to the one in i386 > to add testing coverage for 510.parest_r hotspots. > > As evaluated, the emulated gather capability of vectorizer > (r12-2733) can help to speed up SPEC2017 5

Re: [PATCH] pr103194-5.c: Replace long with int64_t

2021-11-24 Thread Hongtao Liu via Gcc-patches
On Thu, Nov 25, 2021 at 12:18 PM H.J. Lu via Gcc-patches wrote: > > Replace long with int64_t to work with -mx32. Thanks. > > * gcc.target/i386/pr103194-5.c: Replace long with int64_t. > --- > gcc/testsuite/gcc.target/i386/pr103194-5.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletio

Re: [PATCH] Fix regression introduced by r12-5536.

2021-11-29 Thread Hongtao Liu via Gcc-patches
On Mon, Nov 29, 2021 at 3:53 PM Uros Bizjak wrote: > > On Mon, Nov 29, 2021 at 2:32 AM liuhongt wrote: > > > > There're several failures reported in [1]: > > 1. unsupported instruction `pextrw` for "pextrw $0, %xmm31, 16(%rax)" > > %vpextrw should be used in output templates. > > 2. ICE in get_a

Re: [PATCH] Fix regression introduced by r12-5536.

2021-11-29 Thread Hongtao Liu via Gcc-patches
On Tue, Nov 30, 2021 at 5:21 AM Uros Bizjak wrote: > > On Mon, Nov 29, 2021 at 10:48 AM Hongtao Liu wrote: > > > > On Mon, Nov 29, 2021 at 3:53 PM Uros Bizjak wrote: > > > > > > On Mon, Nov 29, 2021 at 2:32 AM liuhongt wrote: > > > > > > &g

Re: [PATCH] [i386] Fix ICE in ix86_attr_length_immediate_default.

2021-11-30 Thread Hongtao Liu via Gcc-patches
On Tue, Nov 30, 2021 at 5:44 PM liuhongt via Gcc-patches wrote: > > ix86_attr_length_immediate_default assume TYPE ishift only have 1 > constant operand, > but *x86_64_shld_1/*x86_shld_1/*x86_64_shrd_1/*x86_shrd_1 has 2, with > condition: INTVAL (operands[3]) == 32 - INTVAL (operands[2]) or > INTV

Re: [PATCH] i386: Improve V8HI and V8HF inserts [PR102811]

2021-12-01 Thread Hongtao Liu via Gcc-patches
On Thu, Dec 2, 2021 at 6:07 AM Uros Bizjak wrote: > > Introduce vec_set_0 pattern for V8HI and V8HF modes to implement scalar > element 0 inserts to from a GP register, SSE register or memory. Also > add V8HI and V8HF AVX2 (x,x,x) alternative to PINSR insn pattern, which is > split after reload t

Re: [PATCH] [i386] Prefer INT_SSE_REGS for SSE_FLOAT_MODE_P in preferred_reload_class.

2021-12-02 Thread Hongtao Liu via Gcc-patches
On Thu, Dec 2, 2021 at 4:27 PM liuhongt wrote: > > The patch helps reload to choose GENENRAL_REGS alternatives for > SSE_FLOAT_MODE and enabled optimization like > > - vmovd %xmm0, -4(%rsp) > - movl$1, %eax > - addl-4(%rsp), %eax > + movd%xmm0, %eax > +

Re: [PATCH] [i386] Prefer INT_SSE_REGS for SSE_FLOAT_MODE_P in preferred_reload_class.

2021-12-05 Thread Hongtao Liu via Gcc-patches
care for 64-bit moves which are expensive on 32-bit > targets. I like your version, update patch. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} w/ and w/o -march=k8. On Mon, Dec 6, 2021 at 11:41 AM liuhongt wrote: > > When moves between integer and sse registers are cheap. > &

Re: [PATCH] [i386]Add combine splitter to transform vpcmpeqd/vpxor/vblendvps to vblendvps for ~op0

2021-12-07 Thread Hongtao Liu via Gcc-patches
On Wed, Dec 8, 2021 at 11:13 AM Jiang, Haochen via Gcc-patches wrote: > > Hi Uros, > > I have fixed that in this patch attached for checking in. Is that ok for > trunk? > Uros already said it's ok with that change, let me check in the patch for you. > Regtested on x86_64-pc-linux-gnu. > > Thx, >

Re: [PATCH] [i386]Add combine splitter to transform vashr/vlshr/vashl_optab to ashr/lshr/ashl_optab for const vector duplicate operand.

2021-12-08 Thread Hongtao Liu via Gcc-patches
On Wed, Dec 8, 2021 at 2:47 PM Haochen Jiang via Gcc-patches wrote: > > Hi, > > This patch add combine splitter to transform vashr/vlshr/vashl_optab to > ashr/lshr/ashl_optab for const vector duplicate operand. > > Regtested on x86_64-pc-linux-gnu. Ok for trunk? Ok. > > BRs, > Haochen > > gcc/Cha

Re: [PATCH 0/6] Add Intel Sierra Forest Instructions

2022-10-14 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 14, 2022 at 4:14 PM Iain Sandoe via Gcc-patches wrote: > > Hi Haochen > > > On 14 Oct 2022, at 08:54, Haochen Jiang via Gcc-patches > > wrote: > > > > > These six patches aimed to add Intel Sierra Forest instructions, including > > AVX-IFMA, AVX-VNNI0INT8, AVX-NE-CONVERT, CMPccXADD.

Re: [PATCH 0/6] Add Intel Sierra Forest Instructions

2022-10-14 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 14, 2022 at 4:24 PM Iain Sandoe wrote: > > > > > On 14 Oct 2022, at 09:20, Hongtao Liu wrote: > > > > On Fri, Oct 14, 2022 at 4:14 PM Iain Sandoe via Gcc-patches > > wrote: > >> > >> Hi Haochen > >> > >>&g

Re: [PATCH 0/6] Add Intel Sierra Forest Instructions

2022-10-14 Thread Hongtao Liu via Gcc-patches
> >> I could not see any target-requires changes in the testcases .. hence my > >> question. > >> > > Guess you are looking at compile tests? > > yes, compile tests would need support from the assembler. > > In my understanding, dg-do compile tests don't need assembler support, it just scan dump o

Re: [PATCH 1/2] Add a parameter for the builtin function of prefetch to align with LLVM

2022-10-14 Thread Hongtao Liu via Gcc-patches
This patch tries to add a parameter to generate instruction prefetch instead of data prefetch. Currently, __builtin_prefetch assumes data prefetch only. On Fri, Oct 14, 2022 at 4:39 PM Haochen Jiang via Gcc-patches wrote: > > gcc/ChangeLog: > > * builtins.cc (expand_builtin_prefetch): Han

Re: [PATCH 0/6] Add Intel Sierra Forest Instructions

2022-10-16 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 14, 2022 at 4:36 PM Iain Sandoe wrote: > > > > > On 14 Oct 2022, at 09:30, Hongtao Liu wrote: > > > > On Fri, Oct 14, 2022 at 4:24 PM Iain Sandoe wrote: > >> > >> > >> > >>> On 14 Oct 2022, at 09:20, Hongtao Liu wrot

Re: [PATCH 1/2] Initial Raptorlake Support

2022-10-16 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 14, 2022 at 3:41 PM Haochen Jiang via Gcc-patches wrote: > > gcc/ChangeLog: > > * common/config/i386/cpuinfo.h: > (get_intel_cpu): Handle Raptorlake. > * common/config/i386/i386-common.cc: > (processor_alias_table): Add Raptorlake. Ok. > --- > gcc/commo

Re: [PATCH 2/2] Initial Meteorlake Support

2022-10-16 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 14, 2022 at 3:41 PM Haochen Jiang via Gcc-patches wrote: > > From: "Hu, Lin1" > > gcc/ChangeLog: > > * common/config/i386/cpuinfo.h: > (get_intel_cpu): Handle Meteorlake. > * common/config/i386/i386-common.cc: > (processor_alias_table): Add Meteorlake.

Re: [PATCH 0/6] Add Intel Sierra Forest Instructions

2022-10-16 Thread Hongtao Liu via Gcc-patches
On Mon, Oct 17, 2022 at 9:30 AM Bernhard Reutner-Fischer wrote: > > On 17 October 2022 03:02:22 CEST, Hongtao Liu via Gcc-patches > > >> >> Do you have this series as a branch somewhere that I can try on one of > >> >> the > >> >> like affect

Re: [PATCH] Move scanning pass of forwprop-19.c to dse1 for r13-3212-gb88adba751da63

2022-10-16 Thread Hongtao Liu via Gcc-patches
On Mon, Oct 17, 2022 at 11:26 AM Liwei Xu via Gcc-patches wrote: > > gcc/testsuite/ChangeLog: > > * gcc.dg/tree-ssa/forwprop-19.c: Move scanning pass from forwprop1 to > dse1, This fixs > the test case fail. Looks like an obvious fix to me. > --- > gcc/testsuite/gcc.dg/tree-ssa/f

Re: [PATCH 2/6] Support Intel AVX-VNNI-INT8

2022-10-16 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 14, 2022 at 3:57 PM Haochen Jiang via Gcc-patches wrote: > > From: Kong Lingling > > gcc/ChangeLog > > * common/config/i386/cpuinfo.h (get_available_features): Detect > avxvnniint8. > * common/config/i386/i386-common.cc > (OPTION_MASK_ISA2_AVXVNNIINT8_S

Re: [PATCH 4/6] Support Intel AVX-NE-CONVERT

2022-10-16 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 14, 2022 at 3:58 PM Haochen Jiang via Gcc-patches wrote: > > From: Kong Lingling > > gcc/ChangeLog: > > * common/config/i386/i386-common.cc > (OPTION_MASK_ISA2_AVXNECONVERT_SET, > OPTION_MASK_ISA2_AVXNECONVERT_UNSET): New. > (ix86_handle_option): Handle

Re: [PATCH 2/6] Support Intel AVX-VNNI-INT8

2022-10-16 Thread Hongtao Liu via Gcc-patches
On Mon, Oct 17, 2022 at 2:27 PM Jiang, Haochen wrote: > > > -Original Message- > > From: Hongtao Liu > > Sent: Monday, October 17, 2022 12:05 PM > > To: Jiang, Haochen > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao > > Subject: Re: [PATCH 2/6] Sup

Re: PING^1: [PATCH] x86: Check corrupted return address when unwinding stack

2022-10-17 Thread Hongtao Liu via Gcc-patches
On Wed, Oct 5, 2022 at 5:33 AM H.J. Lu wrote: > > On Wed, Sep 21, 2022 at 1:42 PM H.J. Lu wrote: > > > > If shadow stack is enabled, when unwinding stack, we count how many stack > > frames we pop to reach the landing pad and adjust shadow stack by the same > > amount. When counting the stack fr

Re: [r13-3219 Regression] FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2 on Linux/x86_64

2022-10-17 Thread Hongtao Liu via Gcc-patches
This should be already fixed. On Mon, Oct 17, 2022 at 4:34 PM haochen.jiang via Gcc-patches wrote: > > On Linux/x86_64, > > 25413fdb2ac24933214123e24ba165026452a6f2 is the first bad commit > commit 25413fdb2ac24933214123e24ba165026452a6f2 > Author: Andre Vieira > Date: Tue Oct 11 10:49:27 2022

Re: [PATCH] Canonicalize vec_perm index to make the first index come from the first vector.

2022-10-18 Thread Hongtao Liu via Gcc-patches
On Wed, Oct 19, 2022 at 7:49 AM H.J. Lu wrote: > > On Tue, Oct 18, 2022 at 4:25 PM liuhongt wrote: > > > > Fix unexpected non-canon form from gimple vector selector. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ok for trunk? > > > > gcc/ChangeLog: > > > > PR targe

Re: [PATCH v2] Support Intel AVX-VNNI-INT8

2022-10-18 Thread Hongtao Liu via Gcc-patches
On Tue, Oct 18, 2022 at 5:13 PM Haochen Jiang via Gcc-patches wrote: > > From: Kong Lingling > > Hi all, > > This is our v2 patch on AVX-VNNI-INT8. This main change in this patch is to > rename the previous UNSPEC_VPMADDxxx things to new vnni style. > > Ok for trunk? The patch LGTM, but please le

Re: [PATCH] i386: Auto vectorize sdot_prod, udot_prod with VNNIINT8 instruction.

2022-10-18 Thread Hongtao Liu via Gcc-patches
On Tue, Oct 18, 2022 at 5:18 PM Haochen Jiang via Gcc-patches wrote: > > Hi all, > > We would like to add one more patch to enhance the codegen with avxvnniint8. > Also renamed two awkward named mode_attr to make them more aligned with > others. > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?

Re: [PATCH 0/6] Add Intel Sierra Forest Instructions

2022-10-19 Thread Hongtao Liu via Gcc-patches
On Wed, Oct 19, 2022 at 7:09 PM Iain Sandoe wrote: > > Hi Hongtao > > > On 17 Oct 2022, at 02:56, Hongtao Liu wrote: > > > > On Mon, Oct 17, 2022 at 9:30 AM Bernhard Reutner-Fischer > > wrote: > >> > >> On 17 October 2022 03:02:22 CEST, Hongtao

Re: [PATCH 1/2] Add a parameter for the builtin function of prefetch to align with LLVM

2022-10-19 Thread Hongtao Liu via Gcc-patches
On Thu, Oct 20, 2022 at 5:15 AM Segher Boessenkool wrote: > > On Wed, Oct 19, 2022 at 10:14:28AM -0700, Andrew Pinski wrote: > > Do the testcases really need to be changed rather than adding new testcases? > > Usually it is better if the testcases not change unless really needed > > to be. That is

Re: [PATCH 1/2] Add a parameter for the builtin function of prefetch to align with LLVM

2022-10-19 Thread Hongtao Liu via Gcc-patches
On Thu, Oct 20, 2022 at 5:08 AM Segher Boessenkool wrote: > > On Fri, Oct 14, 2022 at 04:34:05PM +0800, Haochen Jiang wrote: > > * config/s390/s390.cc (s390_expand_cpymem): Generate fourth parameter > > for > > (Many too long lines here, this is the first one. Changelog lines are > max. 80

Re: [PATCH 1/2] Add a parameter for the builtin function of prefetch to align with LLVM

2022-10-19 Thread Hongtao Liu via Gcc-patches
On Thu, Oct 20, 2022 at 9:39 AM Hongtao Liu wrote: > > On Thu, Oct 20, 2022 at 5:08 AM Segher Boessenkool > wrote: > > > > On Fri, Oct 14, 2022 at 04:34:05PM +0800, Haochen Jiang wrote: > > > * config/s390/s390.cc (s390_expand_cpymem): Generate fourth > &g

Re: [PATCH 2/2] Support Intel prefetchit0/t1

2022-10-19 Thread Hongtao Liu via Gcc-patches
* gcc.target/i386/prefetchi-3.c: Ditto. > > * gcc.target/i386/sse-12.c: Add -mprefetchi. > > * gcc.target/i386/sse-13.c: Ditto. > > * gcc.target/i386/sse-14.c: Ditto. > > * gcc.target/i386/sse-22.c: Add prefetchi. > > *

Re: [PATCH 0/6] Add Intel Sierra Forest Instructions

2022-10-20 Thread Hongtao Liu via Gcc-patches
On Thu, Oct 20, 2022 at 9:11 AM Hongtao Liu wrote: > > On Wed, Oct 19, 2022 at 7:09 PM Iain Sandoe wrote: > > > > Hi Hongtao > > > > > On 17 Oct 2022, at 02:56, Hongtao Liu wrote: > > > > > > On Mon, Oct 17, 2022 at 9:30 AM Bernhard Reutner-Fis

Re: [PATCH 0/6] Add Intel Sierra Forest Instructions

2022-10-20 Thread Hongtao Liu via Gcc-patches
On Thu, Oct 20, 2022 at 5:17 PM Iain Sandoe wrote: > > > > > On 20 Oct 2022, at 10:09, Hongtao Liu via Gcc-patches > > wrote: > > > > On Thu, Oct 20, 2022 at 9:11 AM Hongtao Liu wrote: > >> > >> On Wed, Oct 19, 2022 at 7:09 PM Iain Sandoe &g

Re: [PATCH 0/6] Add Intel Sierra Forest Instructions

2022-10-20 Thread Hongtao Liu via Gcc-patches
> Thanks for giving me a chance to test, this seems OK on Darwin (no large-scale > fallout, anyway) .. > Good to hear that. > I tested the ise046 branch which looks like it collects several of the posted > patch > series, so I’ve covered those too. (not had a chance to test on AVX512 yet, > but i

Re: [PATCH] Support Intel AVX-IFMA

2022-10-20 Thread Hongtao Liu via Gcc-patches
On Wed, Oct 19, 2022 at 2:04 PM Hongyu Wang via Gcc-patches wrote: > > Hi, > > Here is the update patch that align the implementation to AVX-VNNI, > and corrects some spelling error for AVX512IFMA pattern. > > Bootstrapped/regtested on x86_64-pc-linux-gnu and sde. Ok for trunk? Ok for this one. >

Re: [PATCH v2] Support Intel AVX-VNNI-INT8

2022-10-20 Thread Hongtao Liu via Gcc-patches
On Wed, Oct 19, 2022 at 9:41 AM Hongtao Liu wrote: > > On Tue, Oct 18, 2022 at 5:13 PM Haochen Jiang via Gcc-patches > wrote: > > > > From: Kong Lingling > > > > Hi all, > > > > This is our v2 patch on AVX-VNNI-INT8. This main change in this patch i

Re: [PATCH] i386: Auto vectorize sdot_prod, udot_prod with VNNIINT8 instruction.

2022-10-20 Thread Hongtao Liu via Gcc-patches
On Wed, Oct 19, 2022 at 9:43 AM Hongtao Liu wrote: > > On Tue, Oct 18, 2022 at 5:18 PM Haochen Jiang via Gcc-patches > wrote: > > > > Hi all, > > > > We would like to add one more patch to enhance the codegen with avxvnniint8. > > Also renamed two aw

Re: [PATCH 4/6] Support Intel AVX-NE-CONVERT

2022-10-24 Thread Hongtao Liu via Gcc-patches
On Mon, Oct 24, 2022 at 2:20 PM Kong, Lingling wrote: > > > From: Gcc-patches > > On Behalf Of Hongtao Liu via Gcc-patches > > Sent: Monday, October 17, 2022 1:47 PM > > To: Jiang, Haochen > > Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org > > Subject: Re: [

Re: [PATCH] ix86: Suggest unroll factor for loop vectorization

2022-10-24 Thread Hongtao Liu via Gcc-patches
Any comments? On Mon, Oct 24, 2022 at 10:46 AM Cui,Lili via Gcc-patches wrote: > > Hi Hongtao, > > This patch introduces function finish_cost and > determine_suggested_unroll_factor for x86 backend, to make it be > able to suggest the unroll factor for a given loop being vectorized. > Referring t

Re: [PATCH] [x86] Enable V4BFmode and V2BFmode.

2022-10-27 Thread Hongtao Liu via Gcc-patches
I'm going to check in this patch. On Wed, Oct 26, 2022 at 10:30 AM liuhongt wrote: > > Enable V4BFmode and V2BFmode with the same ABI as V4HFmode and > V2HFmode. No real operation is supported for them except for movement. > This should solve PR target/107261. > > Also I notice there's redundancy

Re: [PATCH] x86: Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns

2022-10-27 Thread Hongtao Liu via Gcc-patches
On Thu, Oct 27, 2022 at 2:59 AM H.J. Lu via Gcc-patches wrote: > > In i386.md, neg patterns which set MODE_CC register like > > (set (reg:CCC FLAGS_REG) > (ne:CCC (match_operand:SWI48 1 "general_reg_operand") (const_int 0))) > > can lead to errors when operand 1 is a constant value. If FLAGS

Re: [PATCH] x86: Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns

2022-10-27 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 28, 2022 at 1:56 PM Hongtao Liu wrote: > > On Thu, Oct 27, 2022 at 2:59 AM H.J. Lu via Gcc-patches > wrote: > > > > In i386.md, neg patterns which set MODE_CC register like > > > > (set (reg:CCC FLAGS_REG) > > (ne:CCC (match_operand:SWI4

Re: [PATCH] i386: using __bf16 for AVX512BF16 intrinsics

2022-10-27 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 28, 2022 at 2:20 PM Kong, Lingling via Gcc-patches wrote: > > Hi, > > Previously we use unsigned short to represent bf16. It's not a good > expression, and at the time the front end didn't support bf16 type. > Now we introduced __bf16 to X86 psABI. So we can switch intrinsics to the n

Re: [wwwdocs] [GCC13] Mention Intel __bf16 support in AVX512BF16 intrinsics.

2022-10-31 Thread Hongtao Liu via Gcc-patches
On Tue, Nov 1, 2022 at 9:21 AM Kong, Lingling via Gcc-patches wrote: > > Hi > > The patch is for mention Intel __bf16 support in AVX512BF16 intrinsics. > Ok for master ? > > Thanks, > Lingling > > --- > htdocs/gcc-13/changes.html | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/htdocs/g

Re: [PATCH V2] [x86] Fix incorrect digit constraint

2022-11-01 Thread Hongtao Liu via Gcc-patches
On Mon, Oct 31, 2022 at 5:22 PM Uros Bizjak wrote: > > On Mon, Oct 31, 2022 at 2:10 AM liuhongt wrote: > > > > >You have a couple of other patterns where operand 1 is matched to > > >produce vmovddup insn. These are *avx512f_unpcklpd512 and > > >avx_unpcklpd256. You can also remove expander in bo

Re: [PATCH 6/6] Initial Sierra Forest Support

2022-11-02 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 14, 2022 at 3:57 PM Haochen Jiang via Gcc-patches wrote: > > gcc/ChangeLog: > > * common/config/i386/cpuinfo.h (get_intel_cpu): > Add Sierra Forest. > * common/config/i386/i386-common.cc > (processor_names): Add Sierra Forest. > (processor_alias_

Re: [wwwdocs] [GCC13] Mention Intel __bf16 support in AVX512BF16 intrinsics.

2022-11-03 Thread Hongtao Liu via Gcc-patches
On Thu, Nov 3, 2022 at 2:53 PM Kong, Lingling wrote: > > > > > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html > > > > index 7c6bfa6e..cd0282f1 100644 > > > > --- a/htdocs/gcc-13/changes.html > > > > +++ b/htdocs/gcc-13/changes.html > > > > @@ -230,6 +230,8 @@ a work-in-progre

Re: [PATCH] i386 testsuite: cope with --enable-default-pie

2022-08-14 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 10, 2022 at 1:42 PM Alexandre Oliva via Gcc-patches wrote: > > On Aug 9, 2022, Alexandre Oliva wrote: > > > Ping? > > https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598276.html > > Oops, sorry, I linked to the wrong patch. This is the one I meant to ping: > > https://gcc.gnu.or

Re: [PATCH] x86: Support vector __bf16 type.

2022-08-16 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 16, 2022 at 3:50 PM Kong, Lingling via Gcc-patches wrote: > > Hi, > > The patch is support vector init/broadcast/set/extract for __bf16 type. > The __bf16 type is a storage type. > > OK for master? Ok. > > gcc/ChangeLog: > > * config/i386/i386-expand.cc (ix86_expand_sse_movcc):

Re: [PATCH] Add ABI test for __bf16 type

2022-08-21 Thread Hongtao Liu via Gcc-patches
On Sat, Aug 20, 2022 at 1:31 AM H.J. Lu wrote: > > On Thu, Aug 18, 2022 at 5:56 PM Hongtao Liu via Gcc-patches > wrote: > > > > On Thu, Aug 18, 2022 at 3:36 PM Haochen Jiang via Gcc-patches > > wrote: > > > > > > Hi all, > > > > > &

Re: [PATCH] Add ABI test for __bf16 type

2022-08-21 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 22, 2022 at 9:02 AM Hongtao Liu wrote: > > On Sat, Aug 20, 2022 at 1:31 AM H.J. Lu wrote: > > > > On Thu, Aug 18, 2022 at 5:56 PM Hongtao Liu via Gcc-patches > > wrote: > > > > > > On Thu, Aug 18, 2022 at 3:36 PM Haochen Jiang via Gcc-pa

Re: [PATCH] x86: Cast stride to __PTRDIFF_TYPE__ in AMX intrinsics

2022-08-22 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 23, 2022 at 1:02 AM H.J. Lu wrote: > > On 64-bit Windows, long is 32 bits and can't be used as stride in memory > operand when base is a pointer which is 64 bits. Cast stride to > __PTRDIFF_TYPE__, instead of long. Ok. > > PR target/106714 > * config/i386/amxtileintrin

Re: [PATCH] Add __m128bf16/__m256bf16/__m512bf16 type for bf16 abi test

2022-08-22 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 22, 2022 at 10:16 AM Haochen Jiang via Gcc-patches wrote: > > Hi all, > > This patch added __m128bf16/__m256bf16/__m512bf16 type in testcases. Ok. > > BRs, > Haochen > > gcc/testsuite/ChangeLog: > > * gcc.target/x86_64/abi/bf16/bf16-helper.h: > Add _m128bf16/m256bf16/_m

Re: [PATCH] Don't gimple fold ymm-version vblendvpd/vblendvps/vpblendvb w/o TARGET_AVX2

2022-08-24 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 24, 2022 at 9:15 AM liuhongt wrote: > > Since 256-bit vector integer comparison is under TARGET_AVX2, > and gimple folding for vblendvpd/vblendvps/vpblendvb relies on that. > Restrict gimple fold condition to TARGET_AVX2. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. >

Re: [PATCH] x86: Cast stride to __PTRDIFF_TYPE__ in AMX intrinsics

2022-08-28 Thread Hongtao Liu via Gcc-patches
On Sat, Aug 27, 2022 at 12:51 AM H.J. Lu wrote: > > On Mon, Aug 22, 2022 at 7:05 PM Hongtao Liu wrote: > > > > On Tue, Aug 23, 2022 at 1:02 AM H.J. Lu wrote: > > > > > > On 64-bit Windows, long is 32 bits and can't be used as stride in memory > &g

Re: [PATCH] x86: Handle V8BF in expand_vec_perm_broadcast_1

2022-08-31 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 31, 2022 at 2:52 PM Kong, Lingling via Gcc-patches wrote: > > Hi, > > Handle E_V8BFmode in expand_vec_perm_broadcast_1 and > ix86_expand_vector_init_duplicate. > Ok for trunk? > > gcc/ChangeLog: > > PR target/106742 > * config/i386/i386-expand.cc (ix86_expand_vector_in

Re: [PATCH] x86: Handle V8BF in expand_vec_perm_broadcast_1

2022-09-04 Thread Hongtao Liu via Gcc-patches
On Fri, Sep 2, 2022 at 4:08 PM Kong, Lingling wrote: > > Hi, > > I fixed it in a new patch. And added BF vector mode in SUBST_V and > avx512fmaskhalfmode for @vec_interleave_high. > Ok for trunk ? Ok. > > > > Hi, > > > > > > Handle E_V8BFmode in expand_vec_perm_broadcast_1 and > > ix86_expand_ve

Re: [PATCH] Fix _mm512_cvt_roundps_ph to generate sae instruction.

2022-09-04 Thread Hongtao Liu via Gcc-patches
On Mon, Sep 5, 2022 at 10:44 AM liuhongt wrote: > > zmm-version vcvtps2ph is special, it encodes {sae} in evex, but put > round control in the imm. For intrinsic _mm512_cvt_roundps_ph (a, > imm), imm contains both {sae} and round control, we need to separate > it in the assembly output since vcvtp

Re: [PATCH] AVX512F: Add missing macro for mask(z?)_scalf_s[sd] [PR 105339]

2022-04-23 Thread Hongtao Liu via Gcc-patches
On Fri, Apr 22, 2022 at 8:43 PM Hongyu Wang wrote: > > > Please add the corresponding intrinsic test in sse-14.c > > Sorry for forgetting this part. Updated patch. Thanks. > LGTM. > Hongtao Liu via Gcc-patches 于2022年4月22日周五 16:49写道: > > > > On Fri, Apr 22, 20

Re: [PATCH] Expand __builtin_memcmp_eq with ptest for OI/TImode.

2022-05-05 Thread Hongtao Liu via Gcc-patches
On Thu, May 5, 2022 at 3:37 PM liuhongt wrote: > > Enable optimization for TImode only under 32-bit target, for 64-bit > target there could be extra ineteger <-> sse move regarding psABI, > not efficient. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} > Ok for trunk? > > gcc/ChangeLo

Re: [PATCH] Expand __builtin_memcmp_eq with ptest for OI/TImode.

2022-05-05 Thread Hongtao Liu via Gcc-patches
On Thu, May 5, 2022 at 4:09 PM Uros Bizjak via Gcc-patches wrote: > > On Thu, May 5, 2022 at 9:50 AM Richard Biener via Gcc-patches > wrote: > > > > On Thu, May 5, 2022 at 9:37 AM liuhongt via Gcc-patches > > wrote: > > > > > > Enable optimization for TImode only under 32-bit target, for 64-bit

Re: [PATCH] Expand __builtin_memcmp_eq with ptest for OImode.

2022-05-08 Thread Hongtao Liu via Gcc-patches
On Sat, May 7, 2022 at 1:05 PM liuhongt via Gcc-patches wrote: > > This is adjusted patch only for OImode. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > > PR target/104610 > * config/i386/i386-expand.cc (ix86_expand_branch): Use

Re: [PATCH] [i386] Implement permutation with pslldq + psrldq + por when pshufb is not available.

2022-05-08 Thread Hongtao Liu via Gcc-patches
On Mon, May 9, 2022 at 1:22 PM liuhongt via Gcc-patches wrote: > > pand/pandn may be used to clear upper/lower bits of the operands, in > that case there will be 4-5 instructions for permutation, and it's > still better than scalar codes. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,

<    3   4   5   6   7   8   9   10   11   12   >