Re: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2 on Linux/x86_64

2023-12-11 Thread Hongtao Liu
On Tue, Dec 12, 2023 at 1:47 PM Jiang, Haochen via Gcc-regression wrote: > > > -Original Message- > > From: Jiang, Haochen > > Sent: Tuesday, December 12, 2023 9:11 AM > > To: Andrew Pinski (QUIC) ; haochen.jiang > > ; gcc-regress...@gcc.gnu.org; gcc- > > patc...@gcc.gnu.org > > Subject: R

Re: Disable FMADD in chains for Zen4 and generic

2023-12-12 Thread Hongtao Liu
On Tue, Dec 12, 2023 at 10:38 PM Jan Hubicka wrote: > > Hi, > this patch disables use of FMA in matrix multiplication loop for generic (for > x86-64-v3) and zen4. I tested this on zen4 and Xenon Gold Gold 6212U. > > For Intel this is neutral both on the matrix multiplication microbenchmark > (att

Re: [PATCH] i386: Fix ICE on __builtin_ia32_pabsd128 without lhs [PR112962]

2023-12-13 Thread Hongtao Liu
On Wed, Dec 13, 2023 at 4:44 PM Jakub Jelinek wrote: > > Hi! > > The following patch fixes ICE on the testcase in similar way to how > other folded builtins are handled in ix86_gimple_fold_builtin when > they don't have a lhs; these builtins are const or pure, so normally > DCE would remove them l

Re: [PATCH] i386: Fix isa attribute for TI/TF andnot mode

2023-11-06 Thread Hongtao Liu
On Tue, Nov 7, 2023 at 10:27 AM Haochen Jiang wrote: > > Hi all, > > This patch aims fo fix the wrong isa attribute which caused regression > on PR111907. > > Regtested on x86_64-pc-linux-gnu. Ok for trunk? > > Thx, > Haochen > > gcc/ChangeLog: > > PR target/111907 > * config/i386/

Re: [V2 PATCH] Handle bitop with INTEGER_CST in analyze_and_compute_bitop_with_inv_effect.

2023-11-07 Thread Hongtao Liu
On Tue, Nov 7, 2023 at 4:10 PM Richard Biener wrote: > > On Tue, Nov 7, 2023 at 7:08 AM liuhongt wrote: > > > > analyze_and_compute_bitop_with_inv_effect assumes the first operand is > > loop invariant which is not the case when it's INTEGER_CST. > > > > Bootstrapped and regtseted on x86_64-pc-li

Re: [V2 PATCH] Handle bitop with INTEGER_CST in analyze_and_compute_bitop_with_inv_effect.

2023-11-07 Thread Hongtao Liu
On Tue, Nov 7, 2023 at 10:34 PM Richard Biener wrote: > > On Tue, Nov 7, 2023 at 2:03 PM Hongtao Liu wrote: > > > > On Tue, Nov 7, 2023 at 4:10 PM Richard Biener > > wrote: > > > > > > On Tue, Nov 7, 2023 at 7:08 AM liuhongt wrote: > > >

Re: [PATCH] [i386] APX: Fix ICE due to movti postreload splitter [PR112394]

2023-11-07 Thread Hongtao Liu
On Tue, Nov 7, 2023 at 3:33 PM Hongyu Wang wrote: > > Hi, > > When APX EGPR enabled, the TImode move pattern *movti_internal allows > move between gpr and sse reg using constraint pair ("r","Yd"). Then a > post-reload splitter transform such move to vec_extractv2di, while under > -msse4.1 -mno-avx

Re: [V2 PATCH] Handle bitop with INTEGER_CST in analyze_and_compute_bitop_with_inv_effect.

2023-11-08 Thread Hongtao Liu
On Wed, Nov 8, 2023 at 3:53 PM Richard Biener wrote: > > On Wed, Nov 8, 2023 at 2:18 AM Hongtao Liu wrote: > > > > On Tue, Nov 7, 2023 at 10:34 PM Richard Biener > > wrote: > > > > > > On Tue, Nov 7, 2023 at 2:03 PM Hongtao Liu wrote: > > &g

Re: [PATCH] Avoid generate vblendps with ymm16+

2023-11-08 Thread Hongtao Liu
On Thu, Nov 9, 2023 at 3:15 PM Hu, Lin1 wrote: > > This patch aims to avoid generate vblendps with ymm16+, And have > bootstrapped and tested on x86_64-pc-linux-gnu{-m32,-m64}. Ok for trunk? > > gcc/ChangeLog: > > PR target/112435 > * config/i386/sse.md: Adding constraints to restr

Re: [PATCH] Simplify vector ((VCE?(a cmp b ? -1 : 0)) < 0) ? c : d to just (VCE:a cmp VCE:b) ? c : d.

2023-11-09 Thread Hongtao Liu
On Fri, Nov 10, 2023 at 10:11 AM Andrew Pinski wrote: > > On Thu, Nov 9, 2023 at 5:52 PM liuhongt wrote: > > > > When I'm working on PR112443, I notice there's some misoptimizations: after > > we > > fold _mm{,256}_blendv_epi8/pd/ps into gimple, the backend fails to combine > > it > > back to v

Re: [PATCH] Avoid generate vblendps with ymm16+

2023-11-12 Thread Hongtao Liu
On Sat, Nov 11, 2023 at 4:11 AM Jakub Jelinek wrote: > > On Thu, Nov 09, 2023 at 03:27:11PM +0800, Hongtao Liu wrote: > > On Thu, Nov 9, 2023 at 3:15 PM Hu, Lin1 wrote: > > > > > > This patch aims to avoid generate vblendps with ymm16+, And have > > > boo

Re: [RFC] Intel AVX10.1 Compiler Design and Support

2023-11-12 Thread Hongtao Liu
On Fri, Nov 10, 2023 at 6:15 PM Richard Biener wrote: > > On Fri, Nov 10, 2023 at 2:42 AM Haochen Jiang wrote: > > > > Hi all, > > > > This RFC patch aims to add AVX10.1 options. After we added -m[no-]evex512 > > support, it makes a lot easier to add them comparing to the August version. > > Deta

Re: [PATCH] Simplify vector ((VCE?(a cmp b ? -1 : 0)) < 0) ? c : d to just VCE:((a cmp b) ? (VCE c) : (VCE d)).

2023-11-12 Thread Hongtao Liu
On Fri, Nov 10, 2023 at 2:14 PM liuhongt wrote: > > When I'm working on PR112443, I notice there's some misoptimizations: > after we fold _mm{,256}_blendv_epi8/pd/ps into gimple, the backend > fails to combine it back to v{,p}blendv{v,ps,pd} since the pattern is > too complicated, so I think maybe

Re: [V2 PATCH] Handle bitop with INTEGER_CST in analyze_and_compute_bitop_with_inv_effect.

2023-11-12 Thread Hongtao Liu
On Fri, Nov 10, 2023 at 5:12 PM Richard Biener wrote: > > On Wed, Nov 8, 2023 at 9:22 AM Hongtao Liu wrote: > > > > On Wed, Nov 8, 2023 at 3:53 PM Richard Biener > > wrote: > > > > > > On Wed, Nov 8, 2023 at 2:18 AM Hongtao Liu wrote: > > >

Re: [PATCH] Avoid generate vblendps with ymm16+

2023-11-13 Thread Hongtao Liu
On Mon, Nov 13, 2023 at 4:45 PM Jakub Jelinek wrote: > > On Mon, Nov 13, 2023 at 02:27:35PM +0800, Hongtao Liu wrote: > > > 1) if it isn't better to use separate alternative instead of > > >x86_evex_reg_mentioned_p, like in the patch below > > vblendps doe

Re: [RFC] Intel AVX10.1 Compiler Design and Support

2023-11-13 Thread Hongtao Liu
On Mon, Nov 13, 2023 at 7:25 PM Richard Biener wrote: > > On Mon, Nov 13, 2023 at 7:58 AM Hongtao Liu wrote: > > > > On Fri, Nov 10, 2023 at 6:15 PM Richard Biener > > wrote: > > > > > > On Fri, Nov 10, 2023 at 2:42 AM Haochen Jiang > > >

Re: [PATCH] x86: Make testcase apx-spill_to_egprs-1.c more robust

2023-11-14 Thread Hongtao Liu
On Tue, Nov 14, 2023 at 5:01 PM Lehua Ding wrote: > > Hi, > > This little patch adjust the assert in apx-spill_to_egprs-1.c testcase. > The -mapxf compilation option allows more registers to be used, which in > turn eliminates the need for local variables to be stored in stack memory. > Therefore,

Re: [PATCH] [i386] APX: Fix EGPR usage in several patterns.

2023-11-15 Thread Hongtao Liu
On Wed, Nov 15, 2023 at 5:43 PM Hongyu Wang wrote: > > Hi, > > For vextract/insert{if}128 they cannot adopt EGPR in their memory operand, all > related pattern should be adjusted to disable EGPR usage on them. > Also fix a wrong gpr16 attr for insertps. > > Bootstrapped/regtested on x86-64-pc-linu

Re: [PATCH] Initial support for AVX10.1

2023-11-19 Thread Hongtao Liu
On Fri, Nov 10, 2023 at 9:42 AM Haochen Jiang wrote: > > gcc/ChangeLog: > > * common/config/i386/cpuinfo.h (get_available_features): > Add avx10_set and version and detect avx10.1. > (cpu_indicator_init): Handle avx10.1-512. > * common/config/i386/i386-common.cc >

Re: [PATCH] [APX PPX] Support Intel APX PPX

2023-11-19 Thread Hongtao Liu
On Fri, Nov 17, 2023 at 3:26 PM Hongyu Wang wrote: > > Intel APX PPX feature has been released in [1]. > > PPX stands for Push-Pop Acceleration. PUSH/PUSH2 and its corresponding POP > can be marked with a 1-bit hint to indicate that the POP reads the > value written by the PUSH from the stack. The

Re: [PATCH] [APX PPX] Support Intel APX PPX

2023-11-20 Thread Hongtao Liu
. > > Yes, such change also worked and no cfa adjustment required then, > thanks for the suggestion. > Updated patch with just 1 new UNSPEC and removed cfa handling. LGTM. > > Hongtao Liu 于2023年11月20日周一 14:46写道: > > > > On Fri, Nov 17, 2023 at 3:26 PM Hongyu Wang wrote:

Re: [PATCH] [APX PUSH2POP2] Adjust operand order for PUSH2POP2

2023-11-21 Thread Hongtao Liu
On Wed, Nov 22, 2023 at 11:31 AM Hongyu Wang wrote: > > Hi, > > The push2/pop2 operand order does not match the binutils implementation > for AT&T syntax that it will first push operands[2] then operands[1]. > Correct it by reverse operand order for AT&T syntax. > > Bootstrapped/regtested on x86-6

Re: [PATCH] i386: Fix AVX512 and AVX10 option issues

2023-11-23 Thread Hongtao Liu
On Thu, Nov 23, 2023 at 2:10 PM Haochen Jiang wrote: > > Hi all, > > This patch should be able to fix the current issue mentioned in PR112643. > > Also, I fixed some legacy issues in code related to AVX512/AVX10. > > Ok for trunk? Ok > > Thx, > Haochen > > gcc/ChangeLog: > > PR target/1126

Re: [PATCH] [i386] Fix push2pop2 test fail on non-linux target [PR112729]

2023-11-28 Thread Hongtao Liu
On Tue, Nov 28, 2023 at 9:51 PM Hongyu Wang wrote: > > Hi, > > On linux x86-64, -fomit-frame-pointer was by default enabled so the > push2pop2 tests cfi scans are based on it. On other target with > -fno-omit-frame-pointer the cfi scan will be wrong as the frame pointer > is pushed at first. Add -

Re: [PATCH] i386: Fix CPUID of USER_MSR.

2023-11-28 Thread Hongtao Liu
On Wed, Nov 29, 2023 at 9:23 AM Hu, Lin1 wrote: > > Hi, all > > This patch aims to fix the wrong CPUID of USER_MSR, its correct CPUID is > (0x7, 0x1).EDX[15], But I set it as (0x7, 0x0).EDX[15]. And the patch modefied > testcase for give the user a better example. > > It has been bootstrapped and

Re: [PATCH] Take register pressure into account for vec_construct when the components are not loaded from memory.

2023-11-29 Thread Hongtao Liu
On Wed, Nov 29, 2023 at 3:47 PM Richard Biener wrote: > > On Tue, Nov 28, 2023 at 8:54 AM liuhongt wrote: > > > > For vec_contruct, the components must be live at the same time if > > they're not loaded from memory, when the number of those components > > exceeds available registers, spill happen

Re: [PATCH] Set AVOID_256FMA_CHAINS TO m_GENERIC as it's generally good to new platforms

2023-11-30 Thread Hongtao Liu
Any comments? On Wed, Nov 22, 2023 at 12:17 PM liuhongt wrote: > > From: "Zhang, Annita" > > Avoid_fma_chain was enabled in m_SAPPHIRERAPIDS, m_ALDERLAKE and > m_CORE_HYBRID. It can also be enabled in m_GENERIC to improve the > performance of -march=x86-64-v3/v4 with -mtune=generic set by > defa

Re: [PATCH 00/18] Support -mevex512 for AVX512

2023-09-21 Thread Hongtao Liu
On Thu, Sep 21, 2023 at 3:22 PM Hu, Lin1 wrote: > > Hi all, > > After previous discussion, instead of supporting option -mavx10.1, we > will first introduct option -m[no-]evex512, which will enable/disable > 512 bit register and 64 bit mask register. > > It will not change the current option behav

Re: [PATCH v2 00/13] Support Intel APX EGPR

2023-09-24 Thread Hongtao Liu
On Fri, Sep 22, 2023 at 6:56 PM Hongyu Wang wrote: > > Hi, > > This is a v2 patch for APX support which follows-up previous discussion in > https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628904.html > > As discussed in previous thread, the inverse approach to extend base/index reg > support

Re: [PATCH] [i386] restore recompute to override opts after change [PR113719]

2024-06-26 Thread Hongtao Liu
On Thu, Jun 13, 2024 at 3:32 PM Alexandre Oliva wrote: > > > The first patch for PR113719 regressed gcc.dg/ipa/iinline-attr.c on > toolchains configured to --enable-frame-pointer, because the > optimization node created within handle_optimize_attribute had > flag_omit_frame_pointer incorrectly set

Re: [PATCH] i386: Refactor vcvttps2qq/vcvtqq2ps patterns.

2024-06-27 Thread Hongtao Liu
On Thu, Jun 27, 2024 at 9:23 AM Hu, Lin1 wrote: > > Hi, all > > This patch aims to refactor vcvttps2qq/vcvtqq2ps patterns for remove redundant > round_*_modev8sf_condition. > > Bootstrapped and regtested on x86-64-linux-gnu, OK for trunk? Ok. > > BRs, > Lin > > gcc/ChangeLog: > > * config/

Re: [x86 SSE PATCH] Some additional ternlog refinements.

2024-06-27 Thread Hongtao Liu
On Thu, Jun 27, 2024 at 4:29 PM Roger Sayle wrote: > > > This patch is another round of refinements to fine tune the new ternlog > infrastructure in i386's sse.md. This patch tweaks ix86_ternlog_idx > to allow multiple MEM/CONST_VECTOR/VEC_DUPLICATE operands prior to > splitting (before reload),

Re: [testsuite PATCH] Fix -m32 gcc.target/i386/pr102464-vrndscaleph.c on RedHat.

2024-06-30 Thread Hongtao Liu
On Sun, Jun 30, 2024 at 7:29 PM Roger Sayle wrote: > > > This patch fixes the 4 FAILs of gcc.target/i386/pr192464-vrndscaleph.c > with --target_board='unix{-m32}' on RedHat 7.x. The issue is that this > AVX512 test includes the system math.h, and on older systems this provides > inline versions o

Re: [x86 SSE PATCH] Remove legacy ternlog patterns from sse.md

2024-06-30 Thread Hongtao Liu
On Mon, Jul 1, 2024 at 6:14 AM Roger Sayle wrote: > > > As promised here's the final ternlog clean-up, that deletes the now > obsolete legacy patterns and mode iterators from sse.md. It also updates > the surviving ternlog patterns to consistently use decimal immediate > operands (instead of hexa

Re: [x86 SSE PATCH] Remove legacy ternlog patterns from sse.md

2024-06-30 Thread Hongtao Liu
> > > > gcc/testsuite/ChangeLog > > * gcc.target/i386/pr100711-6.c: Update to check for decimal > > immediate operand in ternlog, not hexadecimal. > I got an ICE when bootstrapped with --enable-checking=yes,rtl,extra > The ICE can be walked around with 2 separate define_predicates,

Re: [PATCH] i386: Support APX NF and NDD for imul/mul

2024-07-01 Thread Hongtao Liu
On Mon, Jul 1, 2024 at 4:51 PM kong lingling wrote: > > Add some missing APX NF and NDD support for imul and mul. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ok for trunk? Ok. > > > gcc/ChangeLog: > > * config/i386/i386.md (*imulhizu): Added APX > NF support.

Re: [PATCH] x86: Update branch hint for Redwood Cove.

2024-07-02 Thread Hongtao Liu
On Wed, Jul 3, 2024 at 2:10 AM Andi Kleen wrote: > > liuhongt writes: > > > From: "H.J. Lu" > > > > According to Intel® 64 and IA-32 Architectures Optimization Reference > > Manual[1], Branch Hint is updated for Redwood Cove. > > > > cut from [1]- > > Starting wit

Re: [PATCH][committed] Move runtime check into a separate function and guard it with target ("no-avx")

2024-07-03 Thread Hongtao Liu
On Thu, Jul 4, 2024 at 6:17 AM H.J. Lu wrote: > > > On Wed, Jul 3, 2024, 9:37 PM Richard Biener > wrote: >> >> On Wed, Jul 3, 2024 at 9:25 AM liuhongt wrote: >> > >> > The patch can avoid SIGILL on non-AVX512 machine due to kmovd is >> > generated in dynamic check. >> > >> > Committed as an obv

Re: [PATCH][committed] Move runtime check into a separate function and guard it with target ("no-avx")

2024-07-03 Thread Hongtao Liu
On Thu, Jul 4, 2024 at 9:41 AM H.J. Lu wrote: > > > On Thu, Jul 4, 2024, 9:12 AM Hongtao Liu wrote: >> >> On Thu, Jul 4, 2024 at 6:17 AM H.J. Lu wrote: >> > >> > >> > On Wed, Jul 3, 2024, 9:37 PM Richard Biener >> > wrote: &

Re: [PATCH] [APX PPX] Avoid generating unmatched pushp/popp in pro/epilogue

2024-07-03 Thread Hongtao Liu
On Tue, Jul 2, 2024 at 11:24 AM Hongyu Wang wrote: > > Hi, > > According to APX spec, the pushp/popp pairs should be matched, > otherwise the PPX hint cannot take effect and cause performance loss. > > In the ix86_expand_epilogue, there are several optimizations that may > cause the epilogue using

Re: [x86 SSE PATCH] PR target/115751: Avoid force_reg in ix86_expand_ternlog.

2024-07-04 Thread Hongtao Liu
On Fri, Jul 5, 2024 at 2:54 AM Roger Sayle wrote: > > > This patch fixes a problem with splitting of complex AVX512 ternlog > instructions on x86_64. A recent change allows the ternlog pattern > to have multiple mem-like operands prior to reload, by emitting any > "reloads" as necessary during sp

Re: [x86 SSE PATCH] PR target/115751: Avoid force_reg in ix86_expand_ternlog.

2024-07-04 Thread Hongtao Liu
On Fri, Jul 5, 2024 at 8:06 AM Hongtao Liu wrote: > > On Fri, Jul 5, 2024 at 2:54 AM Roger Sayle wrote: > > > > > > This patch fixes a problem with splitting of complex AVX512 ternlog > > instructions on x86_64. A recent change allows the ternlog pattern > >

Re: [x86 SSE PATCH] Some AVX512 ternlog expansion refinements.

2024-07-07 Thread Hongtao Liu
On Sun, Jul 7, 2024 at 5:00 PM Roger Sayle wrote: > > > Hi Hongtao, > This should address concerns about the remaining use of force_reg. > 51@@ -25793,15 +25792,20 @@ ix86_expand_ternlog_binop (enum rtx_code code, machine_mode mode, 52 if (GET_MODE (op1) != mode) 53 op1 = gen_lowpart (mod

Re: [PATCH V2] x86: Update branch hint for Redwood Cove.

2024-07-07 Thread Hongtao Liu
On Thu, Jul 4, 2024 at 9:30 AM liuhongt wrote: > > From: "H.J. Lu" > > >The above reads like it would be worth splitting branc_prediction_hits > >into branch_prediction_hints_taken and branch_prediction_hints_not_taken > >given not-taken is the default and thus will just increase code size? > >Ac

Re: Support bitwise and/andnot/abs/neg/copysign/xorsign op for V8BF/V16BF/V32BF

2024-07-07 Thread Hongtao Liu
On Thu, Jul 4, 2024 at 11:24 AM Levy Hsu wrote: > > This patch extends support for BF16 vector operations in GCC, including > bitwise AND, ANDNOT, ABS, NEG, COPYSIGN, and XORSIGN for V8BF, V16BF, and > V32BF modes. > Bootstrapped and tested on x86_64-linux-gnu. ok for trunk? > > gcc/ChangeLog: >

Re: [PATCH 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets

2024-07-10 Thread Hongtao Liu
On Wed, Jul 10, 2024 at 10:10 PM Victor Do Nascimento wrote: > > Following the migration of the dot_prod optab from a direct to a > conversion-type optab, ensure all back-end patterns incorporate the > second machine mode into pattern names. The patch LGTM. BTW you can use existing instead of new

Re: [x86 SSE PATCH] Some AVX512 ternlog expansion refinements (take #2)

2024-07-11 Thread Hongtao Liu
strap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? Ok. > > > 2024-07-11 Roger Sayle > Hongtao Liu > > gcc/ChangeLog > * config/i386/i386-expand.cc (ix86_broadcast_from_con

Re: [PATCH] AVX512BF16: Do not allow permutation with vcvtne2ps2bf16 [PR115889]

2024-07-14 Thread Hongtao Liu
On Sat, Jul 13, 2024 at 3:44 PM Hongyu Wang wrote: > > Hi, > > According to the instruction spec of AVX512BF16, the convert from float > to BF16 is not a simple truncation. It has special handling for > denormal/nan, even for normal float it will add an extra bias according > to the least signific

Re: [PATCH] AVX512BF16: Do not allow permutation with vcvtne2ps2bf16 [PR115889]

2024-07-14 Thread Hongtao Liu
On Mon, Jul 15, 2024 at 10:21 AM Hongyu Wang wrote: > > > Could you just git revert 6d0b7b69d143025f271d0041cfa29cf26e6c343b? > > We can still deal with BFmode permutation the same way as HFmode, so > the change in ix86_vectorize_vec_perm_const can be preserved. > > Hongt

Re: [PATCH] [APX NF] Add a pass to convert legacy insn to NF insns

2024-07-14 Thread Hongtao Liu
On Wed, Jul 10, 2024 at 2:46 PM Hongyu Wang wrote: > > Hi, > > For APX ccmp, current infrastructure will always generate cstore for > the ccmp flag user, like > > cmpe%rcx, %r8 > ccmpnel %rax, %rbx > seta%dil > add %rcx, %r9 > add %r9, %rdx >

Re: [i386] adjust flag_omit_frame_pointer in a single function [PR113719] (was: Re: [PATCH] [i386] restore recompute to override opts after change [PR113719])

2024-07-14 Thread Hongtao Liu
On Thu, Jul 11, 2024 at 9:07 PM Alexandre Oliva wrote: > > On Jul 4, 2024, Alexandre Oliva wrote: > > > On Jul 3, 2024, Rainer Orth wrote: > > > Hmm, I wonder if leaf frame pointer has to do with that. > > It did, in a way. > > > > The first two patches for PR113719 have each regressed >

Re: [PATCH] i386: extend trunc{128}2{16,32,64}'s scope.

2024-07-14 Thread Hongtao Liu
On Mon, Jul 15, 2024 at 1:39 PM Hu, Lin1 wrote: > > Hi, all > > Based on actual usage, trunc{128}2{16,32,64} use some instructions from > sse/sse3, so extend their scope to extend the scope of optimization. > > Bootstraped and regtest on x86-64-linux-gnu, OK for trunk? Ok. > > BRs, > Lin > > gcc/C

Re: [PATCH] i386, testsuite: Fix non-Unicode character

2024-07-16 Thread Hongtao Liu
On Mon, Jul 15, 2024 at 7:24 PM Paul-Antoine Arras wrote: > > This trivially fixes an incorrectly encoded character in the DejaGnu > scan pattern. > > OK for trunk? Ok. > -- > PA -- BR, Hongtao

Re: [x86 SSE] Improve handling of ternlog instructions in i386/sse.md

2024-05-14 Thread Hongtao Liu
On Mon, May 13, 2024 at 5:57 AM Roger Sayle wrote: > > > This patch improves the way that the x86 backend recognizes and > expands AVX512's bitwise ternary logic (vpternlog) instructions. I like the patch. 1 file changed, 25 insertions(+), 1 deletion(-) gcc/config/i386/i386-expand.cc | 26 +++

Re: [PATCH] Don't reduce estimated unrolled size for innermost loop.

2024-05-14 Thread Hongtao Liu
On Mon, May 13, 2024 at 3:40 PM Richard Biener wrote: > > On Mon, May 13, 2024 at 4:29 AM liuhongt wrote: > > > > As testcase in the PR, O3 cunrolli may prevent vectorization for the > > innermost loop and increase register pressure. > > The patch removes the 1/3 reduction of unr_insn for innermo

Re: [PATCH] Don't reduce estimated unrolled size for innermost loop.

2024-05-15 Thread Hongtao Liu
C -std=gnu++14 LP64 note (test for > > > > g++warnings, line 56) > > > > g++: g++.dg/warn/Warray-bounds-20.C -std=gnu++14 note (test for > > > > g++warnings, line 66) > > > > g++: g++.dg/warn/Warray-bounds-20.C -std=gnu++17 LP64 note (test for > > > > g++warnings, line 56) > > > > g++: g++.dg/wa

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Hongtao Liu
On Thu, May 16, 2024 at 10:40 PM Victor Do Nascimento wrote: > > From: Victor Do Nascimento > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct > optabs for dealing with vectorizable dot product code sequences. The > consequence of using a direct optab for this is that backen

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Hongtao Liu
> > > Sorry to chime in, for x86 backend, we defined usdot_prodv16hi, and > 2-way dot_prod operations can be generated > This is the link https://godbolt.org/z/hcWr64vx3, x86 define udot_prodv16qi/udot_prod8hi and both 2-way and 4-way dot_prod instructions are generated -- BR, Hongtao

Re: [PATCH] i386: Rename sat_plusminus expanders to standard names [PR11260]

2024-05-19 Thread Hongtao Liu
On Fri, May 17, 2024 at 3:55 PM Uros Bizjak wrote: > > Rename _3 expander to a standard ssadd, > usadd, sssub and ussub name to enable corresponding optab expansion. > > Also add named expander for MMX modes. LGTM. > > PR middle-end/112600 > > gcc/ChangeLog: > > * config/i386/mmx.md (3): N

Re: [PATCH 0/2] Align tight loops to solve cross cacheline issue

2024-05-19 Thread Hongtao Liu
On Wed, May 15, 2024 at 11:30 AM Jiang, Haochen wrote: > > Also cc Honza and Richard since we touched generic tune. > > Thx, > Haochen > > > -Original Message- > > From: Haochen Jiang > > Sent: Wednesday, May 15, 2024 11:04 AM > > To: gcc-patches@gcc.gnu.org > > Cc: Liu, Hongtao ; ubiz...

Re: [PATCH] Don't reduce estimated unrolled size for innermost loop.

2024-05-20 Thread Hongtao Liu
On Wed, May 15, 2024 at 5:24 PM Richard Biener wrote: > > On Wed, May 15, 2024 at 4:15 AM Hongtao Liu wrote: > > > > On Mon, May 13, 2024 at 3:40 PM Richard Biener > > wrote: > > > > > > On Mon, May 13, 2024 at 4:29 AM liuhongt wrote: > > &g

Re: [PATCH] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-20 Thread Hongtao Liu
On Tue, May 21, 2024 at 2:16 PM Haochen Jiang wrote: > > Hi all, > > Since vpermq is really slow, we should avoid using it when it is > the only instruction could be used for ix86_expand_vecop_qihi2. > > Bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for trunk? Please add a testcase for it.

Re: [PATCH v2] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-21 Thread Hongtao Liu
On Tue, May 21, 2024 at 3:14 PM Haochen Jiang wrote: > > Hi all, > > This is the v2 patch to fix PR115069. The new testcase has passed. > > Changes in v2: > - Added a testcase. > - Change the comment for the early exit. > > Thx, > Haochen > > Since vpermq is really slow, we should avoid using

Re: [V2 PATCH] Don't reduce estimated unrolled size for innermost loop at cunrolli.

2024-05-22 Thread Hongtao Liu
On Wed, May 22, 2024 at 1:07 PM liuhongt wrote: > > >> Hard to find a default value satisfying all testcases. > >> some require loop unroll with 7 insns increment, some don't want loop > >> unroll w/ 5 insn increment. > >> The original 2/3 reduction happened to meet all those testcases(or the > >>

Re: [PATCH] Don't simplify NAN/INF or out-of-range constant for FIX/UNSIGNED_FIX.

2024-05-22 Thread Hongtao Liu
On Wed, May 22, 2024 at 3:59 PM Jakub Jelinek wrote: > > On Wed, May 22, 2024 at 09:46:41AM +0200, Richard Biener wrote: > > On Wed, May 22, 2024 at 3:58 AM liuhongt wrote: > > > > > > According to IEEE standard, for conversions from floating point to > > > integer. When a NaN or infinite operand

Re: [PATCH 3/3] vect: support direct conversion under x86-64-v3.

2024-05-22 Thread Hongtao Liu
On Thu, May 23, 2024 at 2:38 PM Hu, Lin1 wrote: > > gcc/ChangeLog: > > PR 107432 > * config/i386/i386-expand.cc (ix86_expand_trunc_with_avx2_noavx512f): > New function for generate a series of suitable insn. > * config/i386/i386-protos.h (ix86_expand_trunc_with_avx2

Re: [PATCH 3/3] vect: support direct conversion under x86-64-v3.

2024-05-23 Thread Hongtao Liu
On Thu, May 23, 2024 at 3:17 PM Hu, Lin1 wrote: > > > -Original Message- > > From: Hongtao Liu > > Sent: Thursday, May 23, 2024 2:42 PM > > To: Hu, Lin1 > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; > > ubiz...@gmail.com; rguent...@suse.de >

Re: [PATCH 1/2] Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode.

2024-05-23 Thread Hongtao Liu
CC for review. On Tue, May 21, 2024 at 1:12 PM liuhongt wrote: > > When mask is (1 << (prec - imm) - 1) which is used to clear upper bits > of A, then it can be simplified to LSHIFTRT. > > i.e Simplify > (and:v8hi > (ashifrt:v8hi A 8) > (const_vector 0xff x8)) > to > (lshifrt:v8hi A 8) > > Bo

[PATCH] x86: Fix Logical Shift Issue in expand_vec_perm_psrlw_psllw_por [PR115146]

2024-05-26 Thread Hongtao Liu
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652231.html Ok for this. -- BR, Hongtao

Re: [PATCH 0/2] Align tight loops to solve cross cacheline issue

2024-05-26 Thread Hongtao Liu
On Mon, May 20, 2024 at 11:15 AM Hongtao Liu wrote: > > On Wed, May 15, 2024 at 11:30 AM Jiang, Haochen > wrote: > > > > Also cc Honza and Richard since we touched generic tune. > > > > Thx, > > Haochen > > > > > -Original Message-

Re: [PATCH 2/3] vect: Support v4hi -> v4qi.

2024-05-26 Thread Hongtao Liu
On Thu, May 23, 2024 at 2:38 PM Hu, Lin1 wrote: > > gcc/ChangeLog: > > PR target/107432 > * config/i386/mmx.md (truncv4hiv4qi2): New define_insn. > > gcc/testsuite/ChangeLog: > > PR target/107432 > * gcc.target/i386/pr107432-6.c: Add test. > --- > gcc/config/i386/mmx.md

Re: [x86 SSE] Improve handling of ternlog instructions in i386/sse.md (v2)

2024-05-26 Thread Hongtao Liu
On Tue, May 21, 2024 at 5:46 AM Alexander Monakov wrote: > > > Hello! > > I looked at ternlog a bit last year, so I'd like to offer some drive-by > comments. If you want to tackle them in a follow-up patch, or leave for > someone else to handle, please let me know. > > On Fri, 17 May 2024, Roger S

Re: [x86 SSE] Improve handling of ternlog instructions in i386/sse.md (v2)

2024-05-26 Thread Hongtao Liu
On Sat, May 18, 2024 at 4:10 AM Roger Sayle wrote: > > > Hi Hongtao, > Many thanks for the review, bug fixes and suggestions for improvements. > This revised version of the patch, implements all of your corrections. In > theory > the "ternlog idx" should guarantee that some operands are non-null

Re: [x86 SSE] Improve handling of ternlog instructions in i386/sse.md (v2)

2024-05-27 Thread Hongtao Liu
On Mon, May 27, 2024 at 2:48 PM Hongtao Liu wrote: > > On Sat, May 18, 2024 at 4:10 AM Roger Sayle > wrote: > > > > > > Hi Hongtao, > > Many thanks for the review, bug fixes and suggestions for improvements. > > This revised version of the patch,

Re: [PATCH] i386: Fix ix86_option override after change [PR 113719]

2024-05-29 Thread Hongtao Liu
On Thu, May 16, 2024 at 5:15 PM Hongyu Wang wrote: > > Richard Biener 于2024年5月16日周四 15:05写道: > > > > > On Thu, May 16, 2024 at 8:25 AM Hongyu Wang wrote: > > > > > > Hi, > > > > > > In ix86_override_options_after_change, calls to ix86_default_align > > > and ix86_recompute_optlev_based_flags wil

Re: [PATCH 2/3 v2] vect: Support v4hi -> v4qi.

2024-05-29 Thread Hongtao Liu
On Wed, May 29, 2024 at 4:56 PM Hu, Lin1 wrote: > > Exclude add TARGET_MMX_WITH_SSE, I merge two patterns. Ok. > > BRs, > Lin > > gcc/ChangeLog: > > PR target/107432 > * config/i386/mmx.md > (VI2_32_64): New mode iterator. > (mmxhalfmode): New mode atter. > (mmxhalfmodelower):

Re: [PATCH 3/3 v2] vect: support direct conversion under x86-64-v3.

2024-05-29 Thread Hongtao Liu
On Wed, May 29, 2024 at 5:00 PM Hu, Lin1 wrote: > > According to hongtao's suggestion, I support some trunc in mmx.md under > x86-64-v3, and optimize ix86_expand_trunc_with_avx2_noavx512f. Ok. > > BRs, > Lin > > gcc/ChangeLog: > > PR 107432 > * config/i386/i386-expand.cc (ix86_expa

Re: [PATCH] i386: Optimize EQ/NE comparison between avx512 kmask and -1.

2024-05-30 Thread Hongtao Liu
On Tue, May 28, 2024 at 4:00 PM Hu, Lin1 wrote: > > Hi all, > > This patch aims to acheive EQ/NE comparison between avx512 kmask and -1 > by using kxortest with checking CF. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,-m64}. Ok for trunk? Ok. > > BRs, > Lin > > gcc/ChangeLog: > >

Re: [PATCH 1/3] [APX CCMP] Support APX CCMP

2024-05-30 Thread Hongtao Liu
On Wed, May 15, 2024 at 4:24 PM Hongyu Wang wrote: > > APX CCMP feature implements conditional compare which executes compare > when EFLAGS matches certain condition. > > CCMP introduces default flags value (dfv), when conditional compare does > not execute, it will directly set the flags accordin

Re: [PATCH 3/3] [APX CCMP] Support ccmp for float compare

2024-05-30 Thread Hongtao Liu
On Wed, May 15, 2024 at 4:21 PM Hongyu Wang wrote: > > The ccmp insn itself doesn't support fp compare, but x86 has fp comi > insn that changes EFLAG which can be the scc input to ccmp. Allow > scalar fp compare in ix86_gen_ccmp_first except ORDERED/UNORDERD > compare which can not be identified i

Re: [PATCH v3 1/8] [APX NF]: Support APX NF add

2024-06-02 Thread Hongtao Liu
On Wed, May 29, 2024 at 1:11 PM Kong, Lingling wrote: > > Hi, compared with v2, these patches restored the original lea patten position > and addressed hongtao's comment. > > APX NF(no flags) feature implements suppresses the update of status flags > for arithmetic operations. Ok for the patch an

Re: [PATCH] Add AVX10.1 target_clones support

2024-06-02 Thread Hongtao Liu
On Wed, May 29, 2024 at 11:05 AM Haochen Jiang wrote: > > Hi all, > > Since AVX10 is the first major ISA introduced after AVX-512, we propose > to add target_clones support for it. > > Although AVX10.1-256 won't cover 512-bit part of AVX512F, but since > it is only for priority but not for implica

Re: PING: [PATCH] x86: Update BB_HEAD when aligning BB_HEAD

2024-08-11 Thread Hongtao Liu
On Mon, Aug 12, 2024 at 6:59 AM H.J. Lu wrote: > > On Thu, Aug 8, 2024 at 6:53 PM H.J. Lu wrote: > > > > When we emit .p2align to align BB_HEAD, we must update BB_HEAD. Otherwise > > ENDBR will be inserted as the wrong place. > > > > gcc/ > > > > PR target/116174 > > * config/i38

Re: [PATCH 0/1] Initial support for AVX10.2

2024-08-12 Thread Hongtao Liu
On Thu, Aug 1, 2024 at 3:50 PM Haochen Jiang wrote: > > Hi all, > > AVX10.2 tech details has been just published on July 31st in the > following link: > > https://cdrdv2.intel.com/v1/dl/getContent/828965 > > For new features and instructions, we could divide them into two parts. > One is ymm round

Re: [PATCH] Move ix86_align_loops into a separate pass and insert the pass after pass_endbr_and_patchable_area.

2024-08-13 Thread Hongtao Liu
On Mon, Aug 12, 2024 at 10:10 PM liuhongt wrote: > > > Are there any assumptions that BB_HEAD must be a note or label? > > Maybe we should move ix86_align_loops into a separate pass and insert > > the pass just before pass_final. > The patch inserts .p2align after endbr pass, it can also fix the i

Re: [PATCH 1/4] i386: Optimization for APX NDD is always zero-uppered for ADD

2024-08-13 Thread Hongtao Liu
On Mon, Aug 12, 2024 at 3:10 PM kong lingling wrote: > > For APX instruction with an NDD, the destination GPR will get the > instruction’s result in bits [OSIZE-1:0] and, if OSIZE < 64b, have its upper > bits [63:OSIZE] zeroed. Now supporting other NDD instructions. > > > Bootstrapped and regtes

Re: [PATCH 2/4] i386: Optimization for APX NDD is always zero-uppered for sub/adc/sbb

2024-08-13 Thread Hongtao Liu
On Mon, Aug 12, 2024 at 3:12 PM kong lingling wrote: > > gcc/ChangeLog: > > > >PR target/113729 > >* config/i386/i386.md (*subqi_1_zext): New > >define_insn. > >(*subhi_1_zext): Ditto. > >(*addqi3_carry_zext): Ditto. >

Re: [PATCH 3/4] i386: Optimization for APX NDD is always zero-uppered for logic

2024-08-13 Thread Hongtao Liu
On Mon, Aug 12, 2024 at 3:12 PM kong lingling wrote: > > gcc/ChangeLog: > > >PR target/113729 > >* config/i386/i386.md (*andqi_1_zext): > >New define_insn. > >(*andhi_1_zext): Ditto. > >(*qi_1_zext): Ditto. > >

Re: [PATCH 4/4] i386: Optimization for APX NDD is always zero-uppered for shift

2024-08-13 Thread Hongtao Liu
On Mon, Aug 12, 2024 at 3:12 PM kong lingling wrote: > > gcc/ChangeLog: > > > PR target/113729 > >* config/i386/i386.md (*ashlqi3_1_zext): > >New define_insn. > >(*ashlhi3_1_zext): Ditto. > >(*qi3_1_zext): Ditto. > >

Re: [PATCH v2] i386: Fix some vex insns that prohibit egpr

2024-08-14 Thread Hongtao Liu
On Wed, Aug 14, 2024 at 4:23 PM Kong, Lingling wrote: > > > > -Original Message- > From: Kong, Lingling > Sent: Wednesday, August 14, 2024 4:20 PM > To: Kong, Lingling > Subject: [PATCH v2] i386: Fix some vex insns that prohibit egpr > > Although these vex insn have evex counterpart, but

Re: [PATCH v2] [x86] Movement between GENERAL_REGS and SSE_REGS for TImode doesn't need secondary reload.

2024-08-15 Thread Hongtao Liu
On Thu, Aug 15, 2024 at 3:27 PM liuhongt wrote: > > It results in 2 failures for x86_64-pc-linux-gnu{\ > -march=cascadelake}; > > gcc: gcc.target/i386/extendditi3-1.c scan-assembler cqt?o > gcc: gcc.target/i386/pr113560.c scan-assembler-times \tmulq 1 > > For pr113560.c, now GCC generates mulx ins

Re: [PATCH 00/22] Support AVX10.2 ymm rounding

2024-08-18 Thread Hongtao Liu
On Wed, Aug 14, 2024 at 5:07 PM Haochen Jiang wrote: > > Hi all, > > The initial patch for AVX10.2 has been merged this week. > > For the upcoming patches, we will first upstream ymm rounding control part. > > In ymm rounding part, ALL the instructions in AVX512 with 512-bit rounding > control wil

Re: [PATCHv4, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-08-19 Thread Hongtao Liu
On Tue, Aug 20, 2024 at 2:12 PM HAO CHEN GUI wrote: > > Hi, > Add Hongtao Liu as the patch affects x86. > > 在 2024/8/20 6:32, Richard Sandiford 写道: > > HAO CHEN GUI writes: > >> Hi, > >> This patch adds const0 move checking for CLEAR_BY_PIECES.

Re: [PATCH] Align predicates for operands[1] between mov and *mov_internal.

2024-08-20 Thread Hongtao Liu
On Tue, Aug 20, 2024 at 6:25 PM liuhongt wrote: > > From [1] [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660575.html > > > It's not obvious to me why movv16qi requires a nonimmediate_operand > > > source, especially since ix86_expand_vector_mode does have code to > > > cope with con

Re: [PATCHv4, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-08-20 Thread Hongtao Liu
On Tue, Aug 20, 2024 at 2:50 PM Hongtao Liu wrote: > > On Tue, Aug 20, 2024 at 2:12 PM HAO CHEN GUI wrote: > > > > Hi, > > Add Hongtao Liu as the patch affects x86. > > > > 在 2024/8/20 6:32, Richard Sandiford 写道: > > > HAO CHEN GUI writes: &g

Re: [PATCH] Align ix86_{move_max,store_max} with vectorizer.

2024-08-21 Thread Hongtao Liu
On Wed, Aug 21, 2024 at 4:49 PM Richard Biener wrote: > > On Wed, Aug 21, 2024 at 7:40 AM liuhongt wrote: > > > > When none of mprefer-vector-width, avx256_optimal/avx128_optimal, > > avx256_store_by_pieces/avx512_store_by_pieces is specified, GCC will > > set ix86_{move_max,store_max} as max ava

Re: [PATCHv4, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-08-22 Thread Hongtao Liu
On Thu, Aug 22, 2024 at 4:06 PM HAO CHEN GUI wrote: > > Hi Hongtao, > > 在 2024/8/21 11:21, Hongtao Liu 写道: > > r15-3058-gbb42c551905024 support const0 operand for movv16qi, please > > rebase your patch and see if there's still the regressions. > > There

Re: [PATCHv4, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-08-22 Thread Hongtao Liu
On Fri, Aug 23, 2024 at 11:03 AM HAO CHEN GUI wrote: > > Hi Hongtao, > > 在 2024/8/23 9:47, Hongtao Liu 写道: > > On Thu, Aug 22, 2024 at 4:06 PM HAO CHEN GUI wrote: > >> > >> Hi Hongtao, > >> > >> 在 2024/8/21 11:21, Hongtao Liu 写道: > >>

Re: [PATCH 00/12] AVX10.2: Support new instructions

2024-08-25 Thread Hongtao Liu
On Mon, Aug 19, 2024 at 4:57 PM Haochen Jiang wrote: > > Hi all, > > The AVX10.2 ymm rounding patches has been merged to trunk around > 6 hours ago. As mentioned before, next step will be AVX10.2 new > instruction support. > > This patch series could be divided into three part. > > The first patch

Re: [PATCHv4, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-08-25 Thread Hongtao Liu
On Fri, Aug 23, 2024 at 5:46 PM HAO CHEN GUI wrote: > > Hi Hongtao, > > 在 2024/8/23 11:47, Hongtao Liu 写道: > > On Fri, Aug 23, 2024 at 11:03 AM HAO CHEN GUI wrote: > >> > >> Hi Hongtao, > >> > >> 在 2024/8/23 9:47, Hongtao Liu 写道: > >&

<    1   2   3   4   5   6   7   8   9   10   >