Re: [PATCH] Add support for missing AVX512* ISAs (PR target/89929).

2019-04-17 Thread Hongtao Liu
On Wed, Apr 17, 2019 at 4:48 PM Martin Liška wrote: > > On 4/17/19 10:14 AM, Hongtao Liu wrote: > > Any other comments, I'll merge this to trunk? > > Hi. > > I don't understand you. The patch in its original version will no be > installed to trunk >

Re: Enable BF16 support (Please ignore my former email)

2019-04-17 Thread Hongtao Liu
On Fri, Apr 12, 2019 at 11:18 PM H.J. Lu wrote: > > On Fri, Apr 12, 2019 at 3:19 AM Uros Bizjak wrote: > > > > On Fri, Apr 12, 2019 at 11:03 AM Hongtao Liu wrote: > > > > > > On Fri, Apr 12, 2019 at 3:30 PM Uros Bizjak wrote: > > > > >

Re: Enable BF16 support (Please ignore my former email)

2019-05-06 Thread Hongtao Liu
Since GCC 9.1 released [2019-05-03]. I'll merge this to trunk? On Wed, Apr 17, 2019 at 7:14 PM Uros Bizjak wrote: > > On Wed, Apr 17, 2019 at 1:03 PM Uros Bizjak wrote: > > > > On Wed, Apr 17, 2019 at 12:29 PM Hongtao Liu wrote: > > > > > > On Fri

[Patch] Fix ix86_expand_sse_comi_round (PR Target/89750, PR Target/86444)

2019-05-06 Thread Hongtao Liu
- gcc/ChangeLog (revision 270933) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,11 @@ +2019-05-06 H.J. Lu + Hongtao Liu + + PR Target/89750 + PR Target/86444 + * config/i386/i386-expand.c (ix86_expand_sse_comi_round): + Modified, original implementation isn't correct. + 2019-05-

Re: Enable BF16 support (Please ignore my former email)

2019-05-06 Thread Hongtao Liu
On Wed, Apr 17, 2019 at 7:14 PM Uros Bizjak wrote: > > On Wed, Apr 17, 2019 at 1:03 PM Uros Bizjak wrote: > > > > On Wed, Apr 17, 2019 at 12:29 PM Hongtao Liu wrote: > > > > > > On Fri, Apr 12, 2019 at 11:18 PM H.J. Lu wrote: > > > > > >

Re: [Patch] Fix ix86_expand_sse_comi_round (PR Target/89750, PR Target/86444)

2019-05-07 Thread Hongtao Liu
On Tue, May 7, 2019 at 3:03 PM Jakub Jelinek wrote: > > On Tue, May 07, 2019 at 01:38:49PM +0800, Hongtao Liu wrote: > > +2019-05-06 H.J. Lu > > + Hongtao Liu > > + > > + PR Target/89750 > > + PR Target/86444 > > target, not Targe

Re: Enable BF16 support (Please ignore my former email)

2019-05-07 Thread Hongtao Liu
On Wed, May 8, 2019 at 2:33 AM Uros Bizjak wrote: > > On Tue, May 7, 2019 at 8:49 AM Hongtao Liu wrote: > > > > > > > > > > > This patch is about to enable support for bfloat16 > > > > > > > > > > which will

Re: [Patch] Fix ix86_expand_sse_comi_round (PR Target/89750, PR Target/86444)

2019-05-07 Thread Hongtao Liu
Any other comments, i'll merge to trunk? On Tue, May 7, 2019 at 3:31 PM Hongtao Liu wrote: > > On Tue, May 7, 2019 at 3:03 PM Jakub Jelinek wrote: > > > > On Tue, May 07, 2019 at 01:38:49PM +0800, Hongtao Liu wrote: > > > +2019-05-06 H.J.

Re: Enable BF16 support (Please ignore my former email)

2019-05-08 Thread Hongtao Liu
Sorry for the indentation issue, and thanks for your reminder. On Wed, May 8, 2019 at 3:39 PM Uros Bizjak wrote: > > On Wed, May 8, 2019 at 5:06 AM Hongtao Liu wrote: > > > > On Wed, May 8, 2019 at 2:33 AM Uros Bizjak wrote: > > > > > > On Tue, May 7

Re: [Patch] Fix ix86_expand_sse_comi_round (PR Target/89750, PR Target/86444)

2019-05-09 Thread Hongtao Liu
On Fri, May 10, 2019 at 3:55 AM Jeff Law wrote: > > On 5/6/19 11:38 PM, Hongtao Liu wrote: > > Hi Uros and GCC: > > This patch is to fix ix86_expand_sse_comi_round whose implementation > > was not correct. > > New implentation aligns with _mm_cmp_round_s[s

Re: [Patch] Fix ix86_expand_sse_comi_round (PR Target/89750, PR Target/86444)

2019-05-09 Thread Hongtao Liu
On Fri, May 10, 2019 at 12:54 PM Hongtao Liu wrote: > > On Fri, May 10, 2019 at 3:55 AM Jeff Law wrote: > > > > On 5/6/19 11:38 PM, Hongtao Liu wrote: > > > Hi Uros and GCC: > > > This patch is to fix ix86_expand_sse_comi_round whose implementatio

[PATCH]i386: Add BDESC2 for ix86_isa_flags2 and refine all related.

2019-01-21 Thread Hongtao Liu
2, buitins with both flags can't be handled easily. This patch intends to handle this issue. Tested with bootstrap and regression test on x86, no problem found. Is it ok for trunk? Thanks, Hongtao --- gcc/ 2019-01-21 Hongtao Liu H.J. Lu PR target/88909 * config/i386/i386-builtin.def:

Re: [PATCH v2] x86/{,V}AES: adjust when to force EVEX encoding

2024-10-08 Thread Hongtao Liu
On Tue, Oct 8, 2024 at 3:00 PM Jan Beulich wrote: > > On 08.10.2024 08:54, Hongtao Liu wrote: > > On Mon, Sep 30, 2024 at 3:33 PM Jan Beulich wrote: > >> > >> Commit a79d13a01f8c ("i386: Fix aes/vaes patterns [PR114576]") correctly > >> sa

Re: [PATCH] [RFC] target/117072 - more RTL FMA canonicalization

2024-10-13 Thread Hongtao Liu
On Fri, Oct 11, 2024 at 8:33 PM Hongtao Liu wrote: > > On Fri, Oct 11, 2024 at 8:22 PM Richard Biener wrote: > > > > The following helps the x86 backend by canonicalizing FMAs to have > > any negation done to one of the commutative multiplication operands > > be

Re: [PATCH v2 2/2] Adjust testcase after relax O2 vectorization.

2024-10-08 Thread Hongtao Liu
On Tue, Oct 8, 2024 at 4:56 PM Richard Biener wrote: > > On Tue, Oct 8, 2024 at 10:36 AM liuhongt wrote: > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/fstack-protector-strong.c: Adjust > > scan-assembler-times. > > * gcc.dg/graphite/scop-6.c: Add > > -Wno-aggre

Re: [PATCH] target: Fix asm codegen for vfpclasss* and vcvtph2* instructions

2024-10-20 Thread Hongtao Liu
On Sat, Oct 19, 2024 at 2:06 AM Antoni Boucher wrote: > > Thanks for the review. > Here's the updated patch. > > Le 2024-10-17 à 21 h 50, Hongtao Liu a écrit : > > On Fri, Oct 18, 2024 at 9:08 AM Antoni Boucher wrote: > >> > >> Hi. > >> This i

Re: [PATCH] testsuite: Fix typos for AVX10.2 convert testcases

2024-10-17 Thread Hongtao Liu
On Thu, Oct 17, 2024 at 3:17 PM Haochen Jiang wrote: > > From: Victor Rodriguez > > Hi all, > > There are some typos in AVX10.2 vcvtne[,2]ph[b,h]f8[,s] testcases. > They will lead to type mismatch. > > Previously they are not found due to the binutils did not checkin. > > Ok for trunk? Ok. > > Th

Re: [PATCH] target: Fix asm codegen for vfpclasss* and vcvtph2* instructions

2024-10-17 Thread Hongtao Liu
On Fri, Oct 18, 2024 at 9:08 AM Antoni Boucher wrote: > > Hi. > This is a patch for the bug 116725. > I'm not sure if it is a good fix, but it seems to do the job. > If you have suggestions for better comments than what I wrote that would > explain what's happening, I'm open to suggestions. >@@ -

Re: [PATCH] x86: Implement Fast-Math Float Truncation to BF16 via PSRLD Instruction

2024-10-09 Thread Hongtao Liu
On Tue, Oct 8, 2024 at 3:24 PM Levy Hsu wrote: > > Bootstrapped and tested on x86_64-linux-gnu, OK for trunk? Ok. > > gcc/ChangeLog: > > * config/i386/i386.md: Rewrite insn truncsfbf2. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/truncsfbf-1.c: New test. > * gcc.targe

Re: [PATCH v2] x86/{,V}AES: adjust when to force EVEX encoding

2024-10-07 Thread Hongtao Liu
On Mon, Sep 30, 2024 at 3:33 PM Jan Beulich wrote: > > Commit a79d13a01f8c ("i386: Fix aes/vaes patterns [PR114576]") correctly > said "..., but we need to emit {evex} prefix in the assembly if AES ISA > is not enabled". Yet it did so only for the TARGET_AES insns. Going from > the alternative cho

Re: [PATCH] [RFC] target/117072 - more RTL FMA canonicalization

2024-10-11 Thread Hongtao Liu
On Fri, Oct 11, 2024 at 8:22 PM Richard Biener wrote: > > The following helps the x86 backend by canonicalizing FMAs to have > any negation done to one of the commutative multiplication operands > be done to a register (and not a memory operand). Likewise to > put a register operand first and a m

Re: [PATCH] [RFC] target/117072 - more RTL FMA canonicalization

2024-10-14 Thread Hongtao Liu
On Mon, Oct 14, 2024 at 1:50 PM Richard Biener wrote: > > On Mon, 14 Oct 2024, Hongtao Liu wrote: > > > On Sun, Oct 13, 2024 at 8:02 PM Richard Biener wrote: > > > > > > On Sun, 13 Oct 2024, Hongtao Liu wrote: > > > > > > &

Re: [PATCH] [RFC] target/117072 - more RTL FMA canonicalization

2024-10-13 Thread Hongtao Liu
On Sun, Oct 13, 2024 at 8:02 PM Richard Biener wrote: > > On Sun, 13 Oct 2024, Hongtao Liu wrote: > > > On Fri, Oct 11, 2024 at 8:33 PM Hongtao Liu wrote: > > > > > > On Fri, Oct 11, 2024 at 8:22 PM Richard Biener wrote: > > > > > > > > T

Re: [PATCH] target: Fix asm codegen for vfpclasss* and vcvtph2* instructions

2024-10-24 Thread Hongtao Liu
On Fri, Oct 25, 2024 at 12:19 AM Antoni Boucher wrote: > > Thanks. > Did you review the new patch? > Can I push it to master? Ok. > > Le 2024-10-20 à 22 h 01, Hongtao Liu a écrit : > > On Sat, Oct 19, 2024 at 2:06 AM Antoni Boucher wrote: > >> > >> Than

Re: [PATCH v2 7/8] i386: Add else operand to masked loads.

2024-10-29 Thread Hongtao Liu
On Fri, Oct 18, 2024 at 10:23 PM Robin Dapp wrote: > > This patch adds a zero else operand to masked loads, in particular the > masked gather load builtins that are used for gather vectorization. > > gcc/ChangeLog: > > * config/i386/i386-expand.cc (ix86_expand_special_args_builtin): >

Re: [PATCH] testsuite: Adjust AVX10.2 check_effective_target

2024-10-29 Thread Hongtao Liu
On Tue, Oct 29, 2024 at 5:04 PM Haochen Jiang wrote: > > Hi all, > > Since Binutils haven't fully merged all AVX10.2 insts, only testing > one inst/intrin in AVX10.2 is never sufficient for check_effective_target. > Like APX_F, use inline asm to do the target check. > > Testes w/ and w/o Binutils

Re: [PATCH 0/7] Support Intel Diamond Rapid new features

2024-10-28 Thread Hongtao Liu
On Tue, Oct 22, 2024 at 2:31 PM Haochen Jiang wrote: > > Hi all, > > ISE054 has just been released and you can find doc from here: > > https://cdrdv2.intel.com/v1/dl/getContent/671368 > > Diamond Rapids features are added in this ISE, including AMX > related instructions, SM4 EVEX extension and MO

Re: [PATCH v2] i386: Handling exception input of __builtin_ia32_prefetch. [PR117416]

2024-11-04 Thread Hongtao Liu
On Tue, Nov 5, 2024 at 2:41 PM Hu, Lin1 wrote: > > > -Original Message- > > From: Hu, Lin1 > > Sent: Tuesday, November 5, 2024 1:34 PM > > To: gcc-patches@gcc.gnu.org > > Cc: Liu, Hongtao ; ubiz...@gmail.com > > Subject: [PATCH v2] i386: Handling exception input of > > __builtin_ia32_pref

Re: [PATCH] i386: Utilize VCOMSBF16 for BF16 Comparisons with AVX10.2

2024-11-03 Thread Hongtao Liu
On Fri, Nov 1, 2024 at 8:33 AM Hongyu Wang wrote: > > From: Levy Hsu > > This patch enables the use of the VCOMSBF16 instruction from AVX10.2 for > efficient BF16 comparisons. > > Bootstrapped & regtested on x86-64-pc-linux-gnu. > Ok for trunk? Ok. > > gcc/ChangeLog: > > * config/i386/i38

Re: [PATCH v3 7/8] i386: Add else operand to masked loads.

2024-11-03 Thread Hongtao Liu
On Sat, Nov 2, 2024 at 8:58 PM Robin Dapp wrote: > > From: Robin Dapp > > This patch adds a zero else operand to masked loads, in particular the > masked gather load builtins that are used for gather vectorization. > > gcc/ChangeLog: > > * config/i386/i386-expand.cc (ix86_expand_special_a

Re: [PATCH 0/2] Add arch support for Intel CPUs

2024-11-04 Thread Hongtao Liu
On Fri, Nov 1, 2024 at 11:24 AM Haochen Jiang wrote: > > Hi all, > > I have just landed new ISA patches on trunk. The next step will > be the arch support for ISE055 mentioned CPUs. > > There are two changes in ISE055 on CPUs: > > - A new model number is added for Arrow Lake. > - Diamond Rapid

Re: [PATCH] i386: Handling exception input of __builtin_ia32_prefetch. [PR117416]

2024-11-04 Thread Hongtao Liu
On Tue, Nov 5, 2024 at 10:52 AM Hu, Lin1 wrote: > > Hi, all > > __builtin_ia32_prefetch's op1 should be between 0 and 2. So add an error > handler. > > Bootstrapped and regtested on x86_64-pc-linux-gnu, there is a unrelated FAIL > that has yet to be found root cause, just send patch for review. >

Re: [PATCH 1/2] [x86] Support vector float_truncate for SF to BF.

2024-11-05 Thread Hongtao Liu
On Tue, Nov 5, 2024 at 4:46 PM Jakub Jelinek wrote: > > On Tue, Oct 29, 2024 at 07:19:38PM -0700, liuhongt wrote: > > Generate native instruction whenever possible, otherwise use vector > > permutation with odd indices. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ready pu

Re: [PATCH] [x86_64] Add flag to control tight loops alignment opt

2024-11-04 Thread Hongtao Liu
On Tue, Nov 5, 2024 at 2:34 PM Liu, Hongtao wrote: > > > > > -Original Message- > > From: MayShao-oc > > Sent: Tuesday, November 5, 2024 11:20 AM > > To: gcc-patches@gcc.gnu.org; hubi...@ucw.cz; Liu, Hongtao > > ; ubiz...@gmail.com > > Cc: ti...@zhaoxin.com; silviaz...@zhaoxin.com; loui..

Re: [PATCH] [x86_64] Add flag to control tight loops alignment opt

2024-11-05 Thread Hongtao Liu
On Tue, Nov 5, 2024 at 5:33 PM Richard Biener wrote: > > On Tue, Nov 5, 2024 at 8:12 AM Hongtao Liu wrote: > > > > On Tue, Nov 5, 2024 at 2:34 PM Liu, Hongtao wrote: > > > > > > > > > > > > > -Original Message- > > > &g

Re: [PATCH] Intel MOVRS tests: Also scan (%e.x)

2024-11-05 Thread Hongtao Liu
On Wed, Nov 6, 2024 at 8:21 AM H.J. Lu wrote: > > Since x32 uses (%reg32), instead of (%r.x), also scan (%e.x). > > * gcc.target/i386/avx10_2-512-movrs-1.c: Also scan (%e.x). > * gcc.target/i386/avx10_2-movrs-1.c: Likewise. > * gcc.target/i386/movrs-1.c: Likewise. Ok. > > -- > H.J. -- BR, Hong

Re: [PATCH] gcc.target/i386/apx-ndd.c: Also scan (%edi)

2024-11-05 Thread Hongtao Liu
On Wed, Nov 6, 2024 at 8:19 AM H.J. Lu wrote: > > Since x32 uses (%edi), instead of (%rdi), also scan (%edi). > > * gcc.target/i386/apx-ndd.c: Also scan (%edi). Ok. > > -- > H.J. -- BR, Hongtao

Re: [PATCH] [x86_64] Add flag to control tight loops alignment opt

2024-11-05 Thread Hongtao Liu
On Tue, Nov 5, 2024 at 5:50 PM Mayshao-oc wrote: > > > > > > > > On Tue, Nov 5, 2024 at 2:34 PM Liu, Hongtao wrote: > > > > > > > > > > > > > -Original Message- > > > > From: MayShao-oc > > > > Sent: Tuesday, November 5, 2024 11:20 AM > > > > To: gcc-patches@gcc.gnu.org; hubi...@ucw.cz;

Re: [PATCH] i386: Add OPTION_MASK_ISA2_EVEX512 for some AVX512 instructions.

2024-11-05 Thread Hongtao Liu
On Wed, Nov 6, 2024 at 10:35 AM Hu, Lin1 wrote: > > Hi, all > > This patch aims to add OPTION_MASK_ISA2_EVEX512 for all avx512 512-bits > builtin functions, raise error when these builtin functions are used with > -mno-evex512. > > Bootstrapped and Regtested on x86-64-pc-linux-gnu, OK for trunk an

Re: [PATCH] [APX PPX] Avoid generating unmatched pushp/popp in pro/epilogue

2024-10-30 Thread Hongtao Liu
On Thu, Jul 4, 2024 at 11:00 AM Hongtao Liu wrote: > > On Tue, Jul 2, 2024 at 11:24 AM Hongyu Wang wrote: > > > > Hi, > > > > According to APX spec, the pushp/popp pairs should be matched, > > otherwise the PPX hint cannot take effect and ca

Re: [PATCH 2/2] Add X86_TUNE_AVX512_TWO_EPILOGUES, enable for Zen4 and Zen5

2024-11-11 Thread Hongtao Liu
On Mon, Nov 11, 2024 at 8:20 PM Richard Biener wrote: > > The following adds X86_TUNE_AVX512_TWO_EPILOGUES tuning and directs the > vectorizer to produce both a vector AVX2 and SSE epilogue for AVX512 > vectorized loops when set. The tuning is enabled by default for Zen4 > and Zen5 where I benchm

Re: [PATCH] Guard truncate from vector float to vector __bf16 with !flag_rounding_math && HONOR_NANS (BFmode).

2024-11-10 Thread Hongtao Liu
On Fri, Nov 8, 2024 at 10:33 AM liuhongt wrote: > > hw instruction doesn't raise exceptions, turns sNAN into qNAN quietly, > and always round to nearest (even). Output denormals are always > flushed to zero and input denormals are always treated as zero. MXCSR > is not consulted nor updated. > W/o

Re: [PATCH v2 ] i386: Add ix86_expand_integer_cst_argument

2024-11-12 Thread Hongtao Liu
On Wed, Nov 13, 2024 at 8:29 AM H.J. Lu wrote: > > On Wed, Nov 13, 2024 at 5:57 AM H.J. Lu wrote: > > > > On Tue, Nov 12, 2024 at 9:30 PM Richard Biener > > wrote: > > > > > > On Tue, Nov 12, 2024 at 1:49 PM H.J. Lu wrote: > > > > > > > > When passing 0xff as an unsigned char function argument,

Re: [PATCH] [x86] Define VECTOR_STORE_FLAG_VALUE

2024-09-24 Thread Hongtao Liu
On Tue, Sep 24, 2024 at 5:46 PM Uros Bizjak wrote: > > On Tue, Sep 24, 2024 at 11:23 AM liuhongt wrote: > > > > Return constm1_rtx when GET_MODE_CLASS (MODE) == MODE_VECTOR_INT. > > Otherwise NULL_RTX. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ready push to trunk. > >

Re: [PATCH] i386: Add GENERIC and GIMPLE folders of __builtin_ia32_{min,max}* [PR116738]

2024-09-24 Thread Hongtao Liu
On Wed, Sep 25, 2024 at 1:07 AM Jakub Jelinek wrote: > > Hi! > > The following patch adds GENERIC and GIMPLE folders for various > x86 min/max builtins. > As discussed, these builtins have effectively x < y ? x : y > (or x > y ? x : y) behavior. > The GENERIC folding is done if all the (relevant)

Re: [RFC PATCH] Enable vectorization for unknown tripcount in very cheap cost model but disable epilog vectorization.

2024-09-23 Thread Hongtao Liu
On Thu, Sep 19, 2024 at 2:08 PM Richard Biener wrote: > > On Wed, Sep 18, 2024 at 7:55 PM Richard Sandiford > wrote: > > > > Richard Biener writes: > > > On Thu, Sep 12, 2024 at 4:50 PM Hongtao Liu wrote: > > >> > > >> On Wed, Sep 11, 20

Re: [PATCH] x86: Extend AVX512 Vectorization for Popcount in Various Modes

2024-09-25 Thread Hongtao Liu
On Tue, Sep 24, 2024 at 10:16 AM Levy Hsu wrote: > > This patch enables vectorization of the popcount operation for V2QI, V4QI, > V8QI, V2HI, V4HI, and V2SI modes. Ok. > > gcc/ChangeLog: > > * config/i386/mmx.md: > (VQI_16_32_64): New mode iterator for 8-byte, 4-byte, and 2-byte >

Re: [PATCH] x86/{,V}AES: adjust when to force EVEX encoding

2024-09-25 Thread Hongtao Liu
On Wed, Sep 25, 2024 at 2:56 PM Jan Beulich wrote: > > Commit a79d13a01f8c ("i386: Fix aes/vaes patterns [PR114576]") correctly > said "..., but we need to emit {evex} prefix in the assembly if AES ISA > is not enabled". Yet it did so only for the TARGET_AES insns. Going from > the alternative cho

Re: [PATCH] i386, v2: Add GENERIC and GIMPLE folders of __builtin_ia32_{min,max}* [PR116738]

2024-09-25 Thread Hongtao Liu
On Wed, Sep 25, 2024 at 4:42 PM Jakub Jelinek wrote: > > On Wed, Sep 25, 2024 at 10:17:50AM +0800, Hongtao Liu wrote: > > > + for (int i = 0; i < 2; ++i) > > > + { > > > + unsigned count = vector_cst_encoded_nelts (args[i]),

Re: [PATCH] x86/{,V}AES: adjust when to force EVEX encoding

2024-09-25 Thread Hongtao Liu
On Wed, Sep 25, 2024 at 3:55 PM Jan Beulich wrote: > > On 25.09.2024 09:38, Hongtao Liu wrote: > > On Wed, Sep 25, 2024 at 2:56 PM Jan Beulich wrote: > >> > >> Commit a79d13a01f8c ("i386: Fix aes/vaes patterns [PR114576]") correctly > >> sa

Re: [PATCH] i386: Enhance AVX10.2 convert tests

2024-09-18 Thread Hongtao Liu
On Wed, Sep 18, 2024 at 1:42 PM Haochen Jiang wrote: > > Hi all, > > For AVX10.2 convert tests, all of them are missing mask tests > previously, this patch will add them in the tests. > > Tested on sde with assembler with these insts. Ok for trunk? Ok. > > Thx, > Haochen > > gcc/testsuite/ChangeLo

Re: [PATCH] i386: Add ssemov2, sseicvt2 for some load instructions that use memory on operand2

2024-09-18 Thread Hongtao Liu
On Thu, Sep 19, 2024 at 9:34 AM Hu, Lin1 wrote: > > Hi, all > > The memory attr of some instructions should be 'load', but these is 'none' > currently. > > This patch add two new types ssemov2, sseicvt2 for some load instructions that > use memory on operands. So their memory attr will be 'load'.

Re: [PATCH] i386: Add missing avx512f-mask-type.h include

2024-09-18 Thread Hongtao Liu
On Wed, Sep 18, 2024 at 1:40 PM Haochen Jiang wrote: > > Hi all, > > Since commit r15-3594, we fixed the bugs in MASK_TYPE for AVX10.2 > testcases, but we missed the following four. > > The tests are not FAIL since the binutils part haven't been merged > yet, which leads to UNSUPPORTED test. But t

Re: [PATCH] doc: Add more alias option and reorder Intel CPU -march documentation

2024-09-18 Thread Hongtao Liu
On Wed, Sep 18, 2024 at 1:35 PM Haochen Jiang wrote: > > Hi all, > > Since r15-3539, there are requests coming in to add other alias option > documentation. This patch will add all ot them, including corei7, corei7-avx, > core-avx-i, core-avx2, atom, slm, gracemont and emerarldrapids. > > Also in

Re: [PATCH v4 7/8] i386: Add zero maskload else operand.

2024-11-07 Thread Hongtao Liu
On Fri, Nov 8, 2024 at 1:58 AM Robin Dapp wrote: > > From: Robin Dapp > > gcc/ChangeLog: > > * config/i386/sse.md (maskload): > Call maskload..._1. > (maskload_1): Rename. Ok for x86 part. > --- > gcc/config/i386/sse.md | 21 ++--- > 1 file changed, 18 ins

Re: [PATCH] testsuite: Fix up pr116725.c test [PR116725]

2024-11-06 Thread Hongtao Liu
On Wed, Nov 6, 2024 at 4:59 PM Jakub Jelinek wrote: > > On Fri, Oct 18, 2024 at 02:05:59PM -0400, Antoni Boucher wrote: > > PR target/116725 > > * gcc.target/i386/pr116725.c: Add test using those AVX builtins. > > This test FAILs for me, as I don't have the latest gas aroun

Re: [PATCH 1/2] [x86] Support vector float_truncate for SF to BF.

2024-11-06 Thread Hongtao Liu
On Tue, Nov 5, 2024 at 5:19 PM Jakub Jelinek wrote: > > On Tue, Nov 05, 2024 at 05:12:56PM +0800, Hongtao Liu wrote: > > Yes, there's a mismatch between scalar and vector code, I assume users > > may not care much about precision/NAN/INF/denormal behaviors for > >

Re: [PATCH] i386: Add -mavx512vl for pr117304-1.c

2024-11-06 Thread Hongtao Liu
On Thu, Nov 7, 2024 at 2:04 PM Hu, Lin1 wrote: > > > -Original Message- > > From: Liu, Hongtao > > Sent: Thursday, November 7, 2024 11:41 AM > > To: Hu, Lin1 ; gcc-patches@gcc.gnu.org > > Cc: ubiz...@gmail.com > > Subject: RE: [PATCH] i386: Add -mavx512vl for pr117304-1.c > > > > > > > >

Re: [PATCH 1/2] [x86] Support vector float_truncate for SF to BF.

2024-11-07 Thread Hongtao Liu
On Thu, Nov 7, 2024 at 3:52 PM Jakub Jelinek wrote: > > On Thu, Nov 07, 2024 at 01:57:21PM +0800, Hongtao Liu wrote: > > > Does it turn the sNaNs into infinities or qNaNs silently? > > Yes. > > Into infinities? Into qNaNs(Sorry, I didn't see it clea

Re: [PATCH] [x86_64] Add microarchtecture tunable for pass_align_tight_loops

2024-11-06 Thread Hongtao Liu
On Thu, Nov 7, 2024 at 10:29 AM MayShao-oc wrote: > > Hi all: >For zhaoxin, I find no improvement when enable pass_align_tight_loops, > and have performance drop in some cases. >This patch add a new tunable to bypass pass_align_tight_loops in zhaoxin. > >Bootstrapped X86_64. >Ok fo

Re: [PATCH] [x86_64] Add microarchtecture tunable for pass_align_tight_loops

2024-11-07 Thread Hongtao Liu
On Fri, Nov 8, 2024 at 10:21 AM Mayshao-oc wrote: > > > > -Original Message- > > > From: Xi Ruoyao > > > Sent: Thursday, November 7, 2024 1:12 PM > > > To: Liu, Hongtao ; Mayshao-oc > > o...@zhaoxin.com>; Hongtao Liu > > > Cc: g

Re: [PATCH] i386: Disallow long address mode in the x32 mode. [PR 117418]

2024-11-07 Thread Hongtao Liu
On Fri, Nov 8, 2024 at 1:21 PM Hongtao Liu wrote: > > On Fri, Nov 8, 2024 at 12:18 PM H.J. Lu wrote: > > > > On Fri, Nov 8, 2024 at 10:41 AM Hu, Lin1 wrote: > > > > > > Hi, all > > > > > > -maddress-mode=long will let Pmode = DI_mode, but -

Re: [PATCH] i386: Disallow long address mode in the x32 mode. [PR 117418]

2024-11-07 Thread Hongtao Liu
On Fri, Nov 8, 2024 at 12:18 PM H.J. Lu wrote: > > On Fri, Nov 8, 2024 at 10:41 AM Hu, Lin1 wrote: > > > > Hi, all > > > > -maddress-mode=long will let Pmode = DI_mode, but -mx32 request x32 ABI. > > So raise an error to avoid ICE. > > > > Bootstrapped and regtested, OK for trunk? > > > > BRs, >

Re: [PATCH] i386: Disallow long address mode in the x32 mode. [PR 117418]

2024-11-08 Thread Hongtao Liu
On Fri, Nov 8, 2024 at 3:18 PM Uros Bizjak wrote: > > On Fri, Nov 8, 2024 at 6:52 AM Hongtao Liu wrote: > > > > > > PR target/117418 > > > > > * config/i386/i386-options.cc > > > > > (ix86_option_override_internal): raise

Re: [PATCH] Optimize 128-bit vector permutation with pand, pandn and por.

2024-11-24 Thread Hongtao Liu
On Wed, Nov 20, 2024 at 8:03 PM Cui, Lili wrote: > > Hi, all > > This patch aims to handle certain vector shuffle operations using pand, pandn > and por more efficiently. > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? Although it's stage 3, I think this one is low risk, so O

Re: [PATCH] [RFC] Add extra 64bit SSE vector epilogue in some cases

2024-11-24 Thread Hongtao Liu
On Sun, Nov 24, 2024 at 8:05 PM Richard Biener wrote: > > > > > Am 24.11.2024 um 09:17 schrieb Hongtao Liu : > > > > On Fri, Nov 22, 2024 at 9:33 PM Richard Biener wrote: > >> > >> Similar to the X86_TUNE_AVX512_TWO_EPILOGUES tuning which enables

Re: [PATCH] i386/testsuite: Correct AVX10.2 FP8 test mask usage

2024-11-24 Thread Hongtao Liu
On Fri, Nov 22, 2024 at 4:08 PM Haochen Jiang wrote: > > Hi all, > > Under FP8, we should not use AVX512F_LEN_HALF to get the mask size since > it will get 16 instead of 8 and drop into wrong if condition. Correct > the usage for vcvtneph2[b,h]f8[,s] runtime test. > > Tested under sde. Ok for trun

Re: [PATCH] [x86] Fix uninitialized operands[2] in vec_unpacks_hi_v4sf.

2024-11-24 Thread Hongtao Liu
On Fri, Nov 22, 2024 at 9:16 PM Richard Biener wrote: > > On Fri, 22 Nov 2024, liuhongt wrote: > > > It could cause weired spill in RA when register pressure is high. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ok for trunk? > > > > BTW, It's difficult to get a decent tes

Re: [PATCH] i386/testsuite: Do not append AVX10.2 option for check_effective_target

2024-11-21 Thread Hongtao Liu
On Fri, Nov 22, 2024 at 2:40 PM Haochen Jiang wrote: > > Hi all, > > When -avx10.2 meet -march with AVX512 enabled, it will report warning > for vector size conflict. The warning will prevent the test to run on > GCC with arch native build on those platforms when > check_effective_target. > > Remo

Re: [PATCH] __builtin_prefetch fixes [PR117608]

2024-11-27 Thread Hongtao Liu
On Wed, Nov 27, 2024 at 8:50 PM Richard Biener wrote: > > On Wed, 27 Nov 2024, Jakub Jelinek wrote: > > > Hi! > > > > The r15-4833-ge9ab41b79933 patch had among tons of config/i386 > > specific changes also important change to the generic code, allowing > > also 2 as valid value of the second argu

Re: [PATCH] [x86] [RFC] Prevent loop vectorization if it's in a deeply nested big loop.

2024-11-28 Thread Hongtao Liu
On Thu, Nov 28, 2024 at 4:57 PM Richard Biener wrote: > > On Thu, Nov 28, 2024 at 3:04 AM Hongtao Liu wrote: > > > > On Wed, Nov 27, 2024 at 9:43 PM Richard Biener > > wrote: > > > > > > On Wed, Nov 27, 2024 at 4:26 AM liuhongt wrote: > > >

Re: [PATCH] [x86] [RFC] Prevent loop vectorization if it's in a deeply nested big loop.

2024-11-27 Thread Hongtao Liu
On Wed, Nov 27, 2024 at 9:43 PM Richard Biener wrote: > > On Wed, Nov 27, 2024 at 4:26 AM liuhongt wrote: > > > > When loop requires any kind of versioning which could increase register > > pressure too much, and it's in a deeply nest big loop, don't do > > vectorization. > > > > I tested the pat

Re: [PATCH] i386: Fix cstorebf4 fp comparison operand [PR117495]

2024-11-13 Thread Hongtao Liu
On Wed, Nov 13, 2024 at 10:00 AM Hongyu Wang wrote: > > Hi, > > For cstorebf4 it uses comparison_operator for BFmode compare, which is > incorrect when directly uses ix86_expand_setcc as it does not canonicalize > the input comparison to correct the compare code by swapping operands. > Since the o

Re: [PATCH] i386/testsuite: Enhance AVX10.2 vmovd/w testcases

2024-11-20 Thread Hongtao Liu
On Thu, Nov 21, 2024 at 2:40 PM Haochen Jiang wrote: > > Hi all, > > Under -fno-omit-frame-pointer, %ebp will be used, which is the > Solaris/x86 default. Both check %ebp and %esp to avoid error on that. > > Tested under -m32 w/ and w/o -fno-omit-frame-pointer. Ok for trunk? Ok. > > Thx, > Haochen

Re: [PATCH] [RFC] Add extra 64bit SSE vector epilogue in some cases

2024-11-24 Thread Hongtao Liu
On Fri, Nov 22, 2024 at 9:33 PM Richard Biener wrote: > > Similar to the X86_TUNE_AVX512_TWO_EPILOGUES tuning which enables > an extra 128bit SSE vector epilouge when doing 512bit AVX512 > vectorization in the main loop the following allows a 64bit SSE > vector epilogue to be generated when the pr

Re: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2024-12-01 Thread Hongtao Liu
On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote: > > For all different modes of all 0s/1s vectors, we can use the single widest > all 0s/1s vector register for all 0s/1s vector uses in the whole function. > Add a pass to generate a single widest all 0s/1s vector set instruction at > entry of the near

Re: Patch ping - [PATCH] [APX EGPR] Fix indirect call prefix

2024-11-24 Thread Hongtao Liu
On Mon, Nov 25, 2024 at 2:32 PM Kong, Lingling wrote: > > Hi, > > LGTM. > Now Hongyu and Hongtao are working on APX. Ok. > > Thanks, > Lingling > > > -Original Message- > > From: Gregory Kanter > > Sent: Saturday, November 23, 2024 8:16 AM > > To: gcc-patches@gcc.gnu.org > > Cc: Kong, Lin

Re: [RFA for x86] Don't include subst attributes in "@" md helpers

2024-12-23 Thread Hongtao Liu
On Thu, Dec 19, 2024 at 12:01 AM Richard Sandiford wrote: > > In a later patch, I need to add "@" to a pattern that uses subst > iterators. This combination is problematic for two reasons: > > (1) define_substs are applied and filtered at a later stage than the > handling of "@" patterns, so

Re: [PATCH] x86: Verify that PUSH/POP can be skipped

2025-02-07 Thread Hongtao Liu
On Fri, Feb 7, 2025 at 1:57 PM H.J. Lu wrote: > > For > > --- > int f(int); > > int advance(int dz) > { > if (dz > 0) > return (dz + dz) * dz; > else > return dz * f(dz); > } > --- > > Before r15-1619-g3b9b8d6cfdf593 > > advance(int): > pushrbx > mov

Re: [PATCH 0/3] GCC13/GCC12 backport [PR108707][PR109610]

2025-02-09 Thread Hongtao Liu
On Mon, Feb 10, 2025 at 1:43 PM liuhongt wrote: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108707#c9 > > >Pranav Gorantla 2025-02-06 04:30:05 UTC > >Facing similar issue in gcc-13. Is it possible to backport the fix of this > >Bug 108707 and Bug 109610 to gcc-13, gcc-12 as well. > > This se

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-11 Thread Hongtao Liu
> PR117081 is about regression in povray. The reducted testcase: Just for clarification. PR117081 is not about regression in povray. it's related to FAIL: gcc.target/i386/pr91384.c scan-assembler-not testl The pr91384.c is added by r12-7417 which is peephole optimization expecting some specific ins

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-11 Thread Hongtao Liu
On Tue, Feb 11, 2025 at 4:27 PM H.J. Lu wrote: > > On Tue, Feb 11, 2025 at 4:13 PM Hongtao Liu wrote: > > > > > PR117081 is about regression in povray. The reducted testcase: > > Just for clarification. PR117081 is not about regression in povray. > > it's re

Re: [PATCH] i386: Append -march=x86-64-v3 to AVX10.2/512 VNNI testcases

2025-01-22 Thread Hongtao Liu
On Wed, Jan 22, 2025 at 11:13 AM Haochen Jiang wrote: > > Hi all, > > These two testcases are misses on previous addition for > -march=x86-64-v3 to silence warning for -march=native tests. > > Ok for trunk? Ok. > > Thx, > Haochen > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/vnniint16

Re: [PATCH 00/13] Realign x86 GCC after Binutils change [PR118270]

2025-01-21 Thread Hongtao Liu
On Tue, Jan 21, 2025 at 4:42 PM Haochen Jiang wrote: > > Hi all, > > Recently, DMR ISAs got lots of changes in mnemonics. The detailed change > are: > > - NE would be removed for all AVX10.2 new insns > - VCOMSBF16 -> VCOMISBF16 > - P for packed omitted for AI data types (BF16, TF32, FP8) >

Re: [PATCH 0/2] i386: Adjust AVX10 related options

2025-02-16 Thread Hongtao Liu
On Thu, Feb 13, 2025 at 4:08 PM Haochen Jiang wrote: > > Hi all, > > According to the previous feedback on our RFC for AVX10 option adjustment > and discussion with LLVM, we finalized how we are going to handle that. > > The overall direction is to re-alias avx10.x alias to 512 bit and only > usin

Re: [PATCH] i386: Do not check vector size conflict when AVX512 is not explicitly set [PR 118815]

2025-02-16 Thread Hongtao Liu
On Fri, Feb 14, 2025 at 9:56 AM Haochen Jiang wrote: > > Hi all, > > When AVX512 is not explicitly set, we should not take EVEX512 bit into > consideration when checking vector size. It will solve the intrin header > file reporting warnings when compiling with -Wsystem-headers. > > However, there

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-19 Thread Hongtao Liu
On Wed, Feb 19, 2025 at 9:06 PM Jan Hubicka wrote: > > Hi, > this is a variant of a hook I benchmarked on cpu2016 with -Ofast -flto > and -O2 -flto. For non -Os and no Windows ABI should be pratically the > same as your variant that was simply returning mem_cost - 2. > I've tested O2/(Ofast march

Re: [PATCH 0/2] i386: Adjust AVX10 related options

2025-02-27 Thread Hongtao Liu
On Mon, Feb 17, 2025 at 9:51 AM Hongtao Liu wrote: > > On Thu, Feb 13, 2025 at 4:08 PM Haochen Jiang wrote: > > > > Hi all, > > > > According to the previous feedback on our RFC for AVX10 option adjustment > > and discussion with LLVM, we finalized how we a

Re: [PATCH] i386: Correct mask width for bf8->fp16 intrin on 256/512 bit

2025-03-05 Thread Hongtao Liu
On Wed, Mar 5, 2025 at 3:23 PM Haochen Jiang wrote: > > Hi all, > > For bf8 -> pf16 convert, when dst is 256 bit, the mask should be > 16 bit since 16*16=256, not the 8 bit in the current intrin. In > 512 bit intrin, the mask bit is also halved. This patch will fix > both of them. > > Ok for trunk

Re: [PATCH] x86: Move TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P to i386.cc

2025-02-27 Thread Hongtao Liu
On Wed, Feb 26, 2025 at 6:01 AM H.J. Lu wrote: > > Move the TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P target hook from > i386.h to i386.cc. Ok for the patch, looks obvious. > > * config/i386/i386.h (TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P): > Moved to ... > * config/i386/i386.cc (TARGET_SMALL_REGI

Re: [RFA] ira: Add new hooks for callee-save vs spills [PR117477]

2025-03-04 Thread Hongtao Liu
On Tue, Mar 4, 2025 at 6:31 PM Richard Biener wrote: > > On Tue, Mar 4, 2025 at 11:18 AM Richard Sandiford > wrote: > > > > Richard Sandiford writes: > > > Jan Hubicka writes: > > >>> > > >>> Thanks for running these. I saw poor results for perlbench with my > > >>> initial aarch64 hooks becau

Re: [PATCH] i386: Fix AVX10.2 SAT CVT testcases.

2025-03-20 Thread Hongtao Liu
On Thu, Mar 20, 2025 at 3:14 PM Hu, Lin1 wrote: > > Hi, > > res_ref will be modified after MASK_ZERO, init res_ref2 for rounding > control intrinsics. > > Bootstrapped and regtested on x86-64-pc-linux-gnu{-m32,-m64}, OK for trunk? Ok. > > BRs, > Lin > > gcc/testsuite/ChangeLog: > > * gcc.t

Re: [PATCH] i386: Remove XFAIL for pr103750 testcases

2025-03-18 Thread Hongtao Liu
On Tue, Mar 11, 2025 at 2:29 PM Haochen Jiang wrote: > > Hi all, > > After commit r15-4510, the following testcases also do not need XFAIL. > > Ok for trunk? Ok. > > Thx, > Haochen > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/avx512f-pr103750-1.c: Remove XFAIL. > * gcc.target

Re: [PATCH] APX: add nf counterparts for rotl split pattern [PR 119539]

2025-04-01 Thread Hongtao Liu
On Tue, Apr 1, 2025 at 4:40 PM Hongyu Wang wrote: > > Hi, > > For spiltter after 3_mask it now splits the pattern > to *3_mask, causing the splitter doesn't generate > nf variant. Add corresponding nf counterpart for define_insn_and_split > to make the splitter also works for nf insn. > > Bootstra

Re: [PATCH] target/119549 - fixup handling of -mno-sse4

2025-04-01 Thread Hongtao Liu
On Tue, Apr 1, 2025 at 3:56 PM Jakub Jelinek wrote: > > On Tue, Apr 01, 2025 at 01:36:23PM +0800, Hongtao Liu wrote: > > >Changing ix86_valid_target_attribute_inner_p might be even better because > > >OPT_msse4 is RejectNegative option, so !value for it looks weird.

Re: [PATCH] i386: Add attr_isa for vaes patterns to sync with attr gpr16. [pr119473]

2025-03-30 Thread Hongtao Liu
On Fri, Mar 28, 2025 at 1:55 PM Hu, Lin1 wrote: > > For vaes patterns with jm constraint and gpr16 attr, it requires "isa" > attr to distinct avx/avx512 alternatives in ix86_memory_address_reg_class. > Also adds missing type and mode attributes for those vaes patterns. Ok. > > gcc/ChangeLog: > >

Re: [PATCH] i386: Add PTA_AVX10_1_256 to PTA_DIAMONDRAPIDS

2025-03-30 Thread Hongtao Liu
On Fri, Mar 28, 2025 at 4:22 PM Haochen Jiang wrote: > > Hi all, > > For -march= handling, PTA_AVX10_1 will not imply PTA_AVX10_1_256, > resulting in TARGET_AVX10_1 becoming true while TARGET_AVX10_1_256 > false. Since we will check TARGET_AVX10_1_256 in GCC 15 for AVX512 > feature enabling for AV

Re: [PATCH] APX: add nf counterparts for rotl split pattern [PR 119539]

2025-04-02 Thread Hongtao Liu
ngtao 于2025年4月2日周三 08:57写道: > > > > > > > > > -Original Message- > > > From: Uros Bizjak > > > Sent: Tuesday, April 1, 2025 5:24 PM > > > To: Hongtao Liu > > > Cc: Wang, Hongyu ; gcc-patches@gcc.gnu.org; Liu, > > > Hongtao > > &g

Re: [PATCH] i386: Set attr "addr" as "gpr16" for constraint "jm". [PR 119425]

2025-03-26 Thread Hongtao Liu
On Wed, Mar 26, 2025 at 9:50 AM Hu, Lin1 wrote: > > Hi, all > > This patch aims to ensure each alternative with constraint "jm" should > set addr "gpr16", otherwise maybe raise ICE in reload pass. > > Bootstrapped and Regtested for x86_64-pc-linux-gnu{-m32,-m64}, ok for trunk? Ok. > > BRs, > Lin >

Re: [PATCH] target/119549 - fixup handling of -mno-sse4

2025-04-04 Thread Hongtao Liu
On Mon, Mar 31, 2025 at 9:52 PM Richard Biener wrote: > > On Mon, 31 Mar 2025, Jakub Jelinek wrote: > > > On Mon, Mar 31, 2025 at 03:33:34PM +0200, Richard Biener wrote: > > > On Mon, 31 Mar 2025, Jakub Jelinek wrote: > > > > > > > On Mon, Mar 31, 2025 at 03:12:56PM +0200, Richard Biener wrote: >

<    1   2   3   4   5   6   7   8   9   10   >