Re: [PATCH] i386: Add more peephole2 for APX NDD

2025-06-03 Thread Hongtao Liu
On Thu, May 29, 2025 at 4:56 PM Hu, Lin1 wrote: > > Hi, > > The patch aims to optimize > movb(%rdi), %al > movq%rdi, %rbx > xorl%esi, %eax, %edx > movb%dl, (%rdi) > cmpb%sil, %al > jne > to > xorb%sil, (%rdi) >

Re: [PATCH] i386: Add more forms peephole2 for adc/sbb

2025-06-03 Thread Hongtao Liu
On Mon, May 26, 2025 at 4:55 PM Hu, Lin1 wrote: > > Hi, all > > Enable -mapxf will change some patterns about adc/sbb. > > Hence gcc will raise an extra mov like > movq8(%rdi), %rax > adcq%rax, 8(%rsi), %rax > movq%rax, 8(%rdi) > rather than > movq

Re: [PATCH v2 0/7] Remove -mavx10.1-256/512 and -mno-evex512

2025-05-18 Thread Hongtao Liu
On Wed, May 14, 2025 at 3:29 PM Haochen Jiang wrote: > > Hi all, > > This is the v2 patch to remove -mavx10.1/256-512 and -mno-evex512. I suppose > this time all the patches will not be held due to size. > > As mentioned in GCC 15, we will remove -mavx10.1-256/512 and -mno-evex512 > options in GCC

Re: [PATCH] For datarefs with big gap, split them into different groups.

2025-05-15 Thread Hongtao Liu
It's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119181 On Fri, May 16, 2025 at 10:02 AM liuhongt wrote: > > The patch tries to solve miss vectorization for below case. > > void > foo (int* a, int* restrict b) > { > b[0] = a[0] * a[64]; > b[1] = a[65] * a[1]; > b[2] = a[2] * a[66]; >

Re: [PATCH] x86: Add preserve_none and update no_caller_saved_registers attributes

2025-05-13 Thread Hongtao Liu
On Fri, Apr 18, 2025 at 7:10 PM H.J. Lu wrote: > > Add preserve_none attribute which is similar to no_callee_saved_registers > attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are Could you split preserve_none into a separate patch, It looks like it's different from clang's p

Re: [PATCH] Update libbid according to the latest Intel Decimal Floating-Point Math Library.

2025-05-13 Thread Hongtao Liu
On Wed, May 14, 2025 at 9:22 AM liuhongt wrote: > > The Intel Decimal Floating-Point Math Library is available as open-source on > Netlib[1]. > > [1] https://www.netlib.org/misc/intel/ > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ready push to trunk. > > libgcc/config/libbid/Ch

Re: [PATCH v3] Consider frequency in cost estimation when converting scalar to vector.

2025-05-11 Thread Hongtao Liu
On Thu, May 8, 2025 at 2:40 PM liuhongt wrote: > > The only part I changed is related to size_cost of sse_to_ineteger, as below > > 114+ /* Under TARGET_SSE4_1, it's vmovd + vpextrd/vpinsrd. > 115+ W/o it, it's movd + psrlq/unpckldq + movd. */ > 116+ else if (!TARGET_64BIT && smode != SImod

Re: [PATCH v2] x86: Insert extra move for mode size smaller than natural size

2025-05-06 Thread Hongtao Liu
On Wed, May 7, 2025 at 9:06 AM H.J. Lu wrote: > > On Tue, May 6, 2025 at 3:35 PM Hongtao Liu wrote: > > > > On Tue, May 6, 2025 at 3:06 PM H.J. Lu wrote: > > > > > > On Tue, May 6, 2025 at 2:30 PM Liu, Hongtao wrote: > > > > > > > >

Re: [PATCH] x86: Skip if the mode size is smaller than its natural size

2025-05-06 Thread Hongtao Liu
On Tue, May 6, 2025 at 3:06 PM H.J. Lu wrote: > > On Tue, May 6, 2025 at 2:30 PM Liu, Hongtao wrote: > > > > > > > > > -Original Message- > > > From: H.J. Lu > > > Sent: Tuesday, May 6, 2025 2:16 PM > > > To: Liu, Hongtao > > > Cc: GCC Patches ; Uros Bizjak > > > > > > Subject: Re: [PA

Re: [PATCH] i386: Add ix86_expand_unsigned_small_int_cst_argument

2025-04-28 Thread Hongtao Liu
On Sun, Apr 27, 2025 at 10:58 AM H.J. Lu wrote: > > When passing 0xff as an unsigned char function argument with the C frontend > promotion, expand_normal used to get > > constant > 255> > > and returned the rtx value using the sign-extended representation: > > (const_int 255 [0xff]) > > But aft

Re: [PATCH v2] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-28 Thread Hongtao Liu
On Mon, Apr 28, 2025 at 5:07 PM H.J. Lu wrote: > > On Mon, Apr 28, 2025 at 4:26 PM H.J. Lu wrote: > > > > > > > This is what my patch does: > > > But it iterates through vector_insns, using a def-ref chain to find > > > those insns. I think we can just record those single_set with src as > > > co

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-25 Thread Hongtao Liu
> > I am not so sure about this when it come to relatively common > instructions. Hiding things in unspec prevents combine and other RTL > passes from doing their job. I would say that it only makes sense for > siutations where RTL equivalent is very inconvenient. > In the direction of using gener

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-24 Thread Hongtao Liu
On Fri, Apr 25, 2025 at 1:26 PM Jan Hubicka wrote: > > > On Thu, Apr 24, 2025 at 6:27 PM Jan Hubicka wrote: > > > > > > > Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand > > > > or vpandn. > > > > Current register_operand/vector_operand could lose some optimization > > >

Re: [PATCH] [x86] Generate 2 FMA instructions in ix86_expand_swdivsf.

2025-04-23 Thread Hongtao Liu
On Thu, Apr 24, 2025 at 12:54 AM Jan Hubicka wrote: > > > From: "hongtao.liu" > > > > When FMA is available, N-R step can be rewritten with > > > > a / b = (a - (rcp(b) * a * b)) * rcp(b) + rcp(b) * a > > > > which have 2 fma generated.[1] > > > > [1] https://bugs.llvm.org/show_bug.cgi?id=21385 >

Re: [PATCH] Consider frequency in cost estimation when converting scalar to vector.

2025-04-23 Thread Hongtao Liu
On Thu, Apr 24, 2025 at 12:50 AM Jan Hubicka wrote: > > > In some benchmark, I notice stv failed due to cost unprofitable, but the > > igain > > is inside the loop, but sse<->integer conversion is outside the loop, > > current cost > > model doesn't consider the frequency of those gain/cost. > >

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-22 Thread Hongtao Liu
On Mon, Apr 21, 2025 at 2:52 PM liuhongt wrote: > > Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand > or vpandn. > Current register_operand/vector_operand could lose some optimization > opportunity. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for tru

Re: Improve vectorizer costs of min, max, abs, absu and const_expr on x86

2025-04-21 Thread Hongtao Liu
On Tue, Apr 22, 2025 at 10:30 AM Hongtao Liu wrote: > > On Tue, Apr 22, 2025 at 12:46 AM Jan Hubicka wrote: > > > > Hi, > > this patch adds special cases for vectorizer costs in COND_EXPR, MIN_EXPR, > > MAX_EXPR, ABS_EXPR and ABSU_EXPR. We previously costed ABS_E

Re: Improve vectorizer costs of min, max, abs, absu and const_expr on x86

2025-04-21 Thread Hongtao Liu
On Tue, Apr 22, 2025 at 12:46 AM Jan Hubicka wrote: > > Hi, > this patch adds special cases for vectorizer costs in COND_EXPR, MIN_EXPR, > MAX_EXPR, ABS_EXPR and ABSU_EXPR. We previously costed ABS_EXPR and > ABSU_EXPR > but it was only correct for FP variant (wehre it corresponds to andss clea

Re: PING: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-21 Thread Hongtao Liu
On Mon, Apr 21, 2025 at 4:30 PM H.J. Lu wrote: > > On Mon, Apr 21, 2025 at 11:29 AM Hongtao Liu wrote: > > > > On Sat, Apr 19, 2025 at 1:25 PM H.J. Lu wrote: > > > > > > On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote: > > > > > > > >

Re: PING: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-20 Thread Hongtao Liu
On Sat, Apr 19, 2025 at 1:25 PM H.J. Lu wrote: > > On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote: > > > > For all different modes of all 0s/1s vectors, we can use the single widest > > all 0s/1s vector register for all 0s/1s vector uses in the whole function. > > Add a pass to generate a single wi

Re: [PATCH v2] x86: Update memcpy/memset inline strategies for -mtune=generic

2025-04-17 Thread Hongtao Liu
On Tue, Apr 8, 2025 at 3:52 AM H.J. Lu wrote: > > Simplify memcpy and memset inline strategies to avoid branches for > -mtune=generic: > > 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector >load and store for up to 16 * 16 (256) bytes when the data size is >fixed and kn

Re: [PATCH] APX: Don't use red-zone with APX and no caller-saved registers

2025-04-14 Thread Hongtao Liu
On Mon, Apr 14, 2025 at 8:56 PM H.J. Lu wrote: > > On Mon, Apr 14, 2025 at 2:39 AM Uros Bizjak wrote: > > > > On Mon, Apr 14, 2025 at 8:54 AM Hongtao Liu wrote: > > > > > > On Mon, Apr 14, 2025 at 7:36 AM H.J. Lu wrote: > > > > > >

Re: [PATCH] APX: Don't use red-zone with APX and no caller-saved registers

2025-04-13 Thread Hongtao Liu
On Mon, Apr 14, 2025 at 7:36 AM H.J. Lu wrote: > > Don't use red-zone when there are no caller-saved registers and APX is > enabled since 128-byte red-zone is too small for 31 GPRs. > > gcc/ > > PR target/119784 > * config/i386/i386.cc (ix86_using_red_zone): Don't use red-zone >

Re: [PATCH] target/119549 - fixup handling of -mno-sse4

2025-04-04 Thread Hongtao Liu
On Mon, Mar 31, 2025 at 9:52 PM Richard Biener wrote: > > On Mon, 31 Mar 2025, Jakub Jelinek wrote: > > > On Mon, Mar 31, 2025 at 03:33:34PM +0200, Richard Biener wrote: > > > On Mon, 31 Mar 2025, Jakub Jelinek wrote: > > > > > > > On Mon, Mar 31, 2025 at 03:12:56PM +0200, Richard Biener wrote: >

Re: [PATCH] APX: add nf counterparts for rotl split pattern [PR 119539]

2025-04-02 Thread Hongtao Liu
ngtao 于2025年4月2日周三 08:57写道: > > > > > > > > > -Original Message- > > > From: Uros Bizjak > > > Sent: Tuesday, April 1, 2025 5:24 PM > > > To: Hongtao Liu > > > Cc: Wang, Hongyu ; gcc-patches@gcc.gnu.org; Liu, > > > Hongtao > > &g

Re: [PATCH] APX: add nf counterparts for rotl split pattern [PR 119539]

2025-04-01 Thread Hongtao Liu
On Tue, Apr 1, 2025 at 4:40 PM Hongyu Wang wrote: > > Hi, > > For spiltter after 3_mask it now splits the pattern > to *3_mask, causing the splitter doesn't generate > nf variant. Add corresponding nf counterpart for define_insn_and_split > to make the splitter also works for nf insn. > > Bootstra

Re: [PATCH] target/119549 - fixup handling of -mno-sse4

2025-04-01 Thread Hongtao Liu
On Tue, Apr 1, 2025 at 3:56 PM Jakub Jelinek wrote: > > On Tue, Apr 01, 2025 at 01:36:23PM +0800, Hongtao Liu wrote: > > >Changing ix86_valid_target_attribute_inner_p might be even better because > > >OPT_msse4 is RejectNegative option, so !value for it looks weird.

Re: [PATCH] i386: Add attr_isa for vaes patterns to sync with attr gpr16. [pr119473]

2025-03-30 Thread Hongtao Liu
On Fri, Mar 28, 2025 at 1:55 PM Hu, Lin1 wrote: > > For vaes patterns with jm constraint and gpr16 attr, it requires "isa" > attr to distinct avx/avx512 alternatives in ix86_memory_address_reg_class. > Also adds missing type and mode attributes for those vaes patterns. Ok. > > gcc/ChangeLog: > >

Re: [PATCH] i386: Add PTA_AVX10_1_256 to PTA_DIAMONDRAPIDS

2025-03-30 Thread Hongtao Liu
On Fri, Mar 28, 2025 at 4:22 PM Haochen Jiang wrote: > > Hi all, > > For -march= handling, PTA_AVX10_1 will not imply PTA_AVX10_1_256, > resulting in TARGET_AVX10_1 becoming true while TARGET_AVX10_1_256 > false. Since we will check TARGET_AVX10_1_256 in GCC 15 for AVX512 > feature enabling for AV

Re: [PATCH] i386: Set attr "addr" as "gpr16" for constraint "jm". [PR 119425]

2025-03-26 Thread Hongtao Liu
On Wed, Mar 26, 2025 at 9:50 AM Hu, Lin1 wrote: > > Hi, all > > This patch aims to ensure each alternative with constraint "jm" should > set addr "gpr16", otherwise maybe raise ICE in reload pass. > > Bootstrapped and Regtested for x86_64-pc-linux-gnu{-m32,-m64}, ok for trunk? Ok. > > BRs, > Lin >

Re: [PATCH] i386: Fix AVX10.2 SAT CVT testcases.

2025-03-20 Thread Hongtao Liu
On Thu, Mar 20, 2025 at 3:14 PM Hu, Lin1 wrote: > > Hi, > > res_ref will be modified after MASK_ZERO, init res_ref2 for rounding > control intrinsics. > > Bootstrapped and regtested on x86-64-pc-linux-gnu{-m32,-m64}, OK for trunk? Ok. > > BRs, > Lin > > gcc/testsuite/ChangeLog: > > * gcc.t

Re: [PATCH] i386: Remove XFAIL for pr103750 testcases

2025-03-18 Thread Hongtao Liu
On Tue, Mar 11, 2025 at 2:29 PM Haochen Jiang wrote: > > Hi all, > > After commit r15-4510, the following testcases also do not need XFAIL. > > Ok for trunk? Ok. > > Thx, > Haochen > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/avx512f-pr103750-1.c: Remove XFAIL. > * gcc.target

Re: [PATCH] i386: Correct mask width for bf8->fp16 intrin on 256/512 bit

2025-03-05 Thread Hongtao Liu
On Wed, Mar 5, 2025 at 3:23 PM Haochen Jiang wrote: > > Hi all, > > For bf8 -> pf16 convert, when dst is 256 bit, the mask should be > 16 bit since 16*16=256, not the 8 bit in the current intrin. In > 512 bit intrin, the mask bit is also halved. This patch will fix > both of them. > > Ok for trunk

Re: [RFA] ira: Add new hooks for callee-save vs spills [PR117477]

2025-03-04 Thread Hongtao Liu
On Tue, Mar 4, 2025 at 6:31 PM Richard Biener wrote: > > On Tue, Mar 4, 2025 at 11:18 AM Richard Sandiford > wrote: > > > > Richard Sandiford writes: > > > Jan Hubicka writes: > > >>> > > >>> Thanks for running these. I saw poor results for perlbench with my > > >>> initial aarch64 hooks becau

Re: [PATCH 0/2] i386: Adjust AVX10 related options

2025-02-27 Thread Hongtao Liu
On Mon, Feb 17, 2025 at 9:51 AM Hongtao Liu wrote: > > On Thu, Feb 13, 2025 at 4:08 PM Haochen Jiang wrote: > > > > Hi all, > > > > According to the previous feedback on our RFC for AVX10 option adjustment > > and discussion with LLVM, we finalized how we a

Re: [PATCH] x86: Move TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P to i386.cc

2025-02-27 Thread Hongtao Liu
On Wed, Feb 26, 2025 at 6:01 AM H.J. Lu wrote: > > Move the TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P target hook from > i386.h to i386.cc. Ok for the patch, looks obvious. > > * config/i386/i386.h (TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P): > Moved to ... > * config/i386/i386.cc (TARGET_SMALL_REGI

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-19 Thread Hongtao Liu
On Wed, Feb 19, 2025 at 9:06 PM Jan Hubicka wrote: > > Hi, > this is a variant of a hook I benchmarked on cpu2016 with -Ofast -flto > and -O2 -flto. For non -Os and no Windows ABI should be pratically the > same as your variant that was simply returning mem_cost - 2. > I've tested O2/(Ofast march

Re: [PATCH 0/2] i386: Adjust AVX10 related options

2025-02-16 Thread Hongtao Liu
On Thu, Feb 13, 2025 at 4:08 PM Haochen Jiang wrote: > > Hi all, > > According to the previous feedback on our RFC for AVX10 option adjustment > and discussion with LLVM, we finalized how we are going to handle that. > > The overall direction is to re-alias avx10.x alias to 512 bit and only > usin

Re: [PATCH] i386: Do not check vector size conflict when AVX512 is not explicitly set [PR 118815]

2025-02-16 Thread Hongtao Liu
On Fri, Feb 14, 2025 at 9:56 AM Haochen Jiang wrote: > > Hi all, > > When AVX512 is not explicitly set, we should not take EVEX512 bit into > consideration when checking vector size. It will solve the intrin header > file reporting warnings when compiling with -Wsystem-headers. > > However, there

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-11 Thread Hongtao Liu
On Tue, Feb 11, 2025 at 4:27 PM H.J. Lu wrote: > > On Tue, Feb 11, 2025 at 4:13 PM Hongtao Liu wrote: > > > > > PR117081 is about regression in povray. The reducted testcase: > > Just for clarification. PR117081 is not about regression in povray. > > it's re

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-11 Thread Hongtao Liu
> PR117081 is about regression in povray. The reducted testcase: Just for clarification. PR117081 is not about regression in povray. it's related to FAIL: gcc.target/i386/pr91384.c scan-assembler-not testl The pr91384.c is added by r12-7417 which is peephole optimization expecting some specific ins

Re: [PATCH 0/3] GCC13/GCC12 backport [PR108707][PR109610]

2025-02-09 Thread Hongtao Liu
On Mon, Feb 10, 2025 at 1:43 PM liuhongt wrote: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108707#c9 > > >Pranav Gorantla 2025-02-06 04:30:05 UTC > >Facing similar issue in gcc-13. Is it possible to backport the fix of this > >Bug 108707 and Bug 109610 to gcc-13, gcc-12 as well. > > This se

Re: [PATCH] x86: Verify that PUSH/POP can be skipped

2025-02-07 Thread Hongtao Liu
On Fri, Feb 7, 2025 at 1:57 PM H.J. Lu wrote: > > For > > --- > int f(int); > > int advance(int dz) > { > if (dz > 0) > return (dz + dz) * dz; > else > return dz * f(dz); > } > --- > > Before r15-1619-g3b9b8d6cfdf593 > > advance(int): > pushrbx > mov

Re: [PATCH] i386: Append -march=x86-64-v3 to AVX10.2/512 VNNI testcases

2025-01-22 Thread Hongtao Liu
On Wed, Jan 22, 2025 at 11:13 AM Haochen Jiang wrote: > > Hi all, > > These two testcases are misses on previous addition for > -march=x86-64-v3 to silence warning for -march=native tests. > > Ok for trunk? Ok. > > Thx, > Haochen > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/vnniint16

Re: [PATCH 00/13] Realign x86 GCC after Binutils change [PR118270]

2025-01-21 Thread Hongtao Liu
On Tue, Jan 21, 2025 at 4:42 PM Haochen Jiang wrote: > > Hi all, > > Recently, DMR ISAs got lots of changes in mnemonics. The detailed change > are: > > - NE would be removed for all AVX10.2 new insns > - VCOMSBF16 -> VCOMISBF16 > - P for packed omitted for AI data types (BF16, TF32, FP8) >

Re: [RFA for x86] Don't include subst attributes in "@" md helpers

2024-12-23 Thread Hongtao Liu
On Thu, Dec 19, 2024 at 12:01 AM Richard Sandiford wrote: > > In a later patch, I need to add "@" to a pattern that uses subst > iterators. This combination is problematic for two reasons: > > (1) define_substs are applied and filtered at a later stage than the > handling of "@" patterns, so

Re: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2024-12-01 Thread Hongtao Liu
On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote: > > For all different modes of all 0s/1s vectors, we can use the single widest > all 0s/1s vector register for all 0s/1s vector uses in the whole function. > Add a pass to generate a single widest all 0s/1s vector set instruction at > entry of the near

Re: [PATCH] [x86] [RFC] Prevent loop vectorization if it's in a deeply nested big loop.

2024-11-28 Thread Hongtao Liu
On Thu, Nov 28, 2024 at 4:57 PM Richard Biener wrote: > > On Thu, Nov 28, 2024 at 3:04 AM Hongtao Liu wrote: > > > > On Wed, Nov 27, 2024 at 9:43 PM Richard Biener > > wrote: > > > > > > On Wed, Nov 27, 2024 at 4:26 AM liuhongt wrote: > > >

Re: [PATCH] __builtin_prefetch fixes [PR117608]

2024-11-27 Thread Hongtao Liu
On Wed, Nov 27, 2024 at 8:50 PM Richard Biener wrote: > > On Wed, 27 Nov 2024, Jakub Jelinek wrote: > > > Hi! > > > > The r15-4833-ge9ab41b79933 patch had among tons of config/i386 > > specific changes also important change to the generic code, allowing > > also 2 as valid value of the second argu

Re: [PATCH] [x86] [RFC] Prevent loop vectorization if it's in a deeply nested big loop.

2024-11-27 Thread Hongtao Liu
On Wed, Nov 27, 2024 at 9:43 PM Richard Biener wrote: > > On Wed, Nov 27, 2024 at 4:26 AM liuhongt wrote: > > > > When loop requires any kind of versioning which could increase register > > pressure too much, and it's in a deeply nest big loop, don't do > > vectorization. > > > > I tested the pat

Re: Patch ping - [PATCH] [APX EGPR] Fix indirect call prefix

2024-11-24 Thread Hongtao Liu
On Mon, Nov 25, 2024 at 2:32 PM Kong, Lingling wrote: > > Hi, > > LGTM. > Now Hongyu and Hongtao are working on APX. Ok. > > Thanks, > Lingling > > > -Original Message- > > From: Gregory Kanter > > Sent: Saturday, November 23, 2024 8:16 AM > > To: gcc-patches@gcc.gnu.org > > Cc: Kong, Lin

Re: [PATCH] i386/testsuite: Correct AVX10.2 FP8 test mask usage

2024-11-24 Thread Hongtao Liu
On Fri, Nov 22, 2024 at 4:08 PM Haochen Jiang wrote: > > Hi all, > > Under FP8, we should not use AVX512F_LEN_HALF to get the mask size since > it will get 16 instead of 8 and drop into wrong if condition. Correct > the usage for vcvtneph2[b,h]f8[,s] runtime test. > > Tested under sde. Ok for trun

Re: [PATCH] Optimize 128-bit vector permutation with pand, pandn and por.

2024-11-24 Thread Hongtao Liu
On Wed, Nov 20, 2024 at 8:03 PM Cui, Lili wrote: > > Hi, all > > This patch aims to handle certain vector shuffle operations using pand, pandn > and por more efficiently. > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? Although it's stage 3, I think this one is low risk, so O

Re: [PATCH] [RFC] Add extra 64bit SSE vector epilogue in some cases

2024-11-24 Thread Hongtao Liu
On Sun, Nov 24, 2024 at 8:05 PM Richard Biener wrote: > > > > > Am 24.11.2024 um 09:17 schrieb Hongtao Liu : > > > > On Fri, Nov 22, 2024 at 9:33 PM Richard Biener wrote: > >> > >> Similar to the X86_TUNE_AVX512_TWO_EPILOGUES tuning which enables

Re: [PATCH] [x86] Fix uninitialized operands[2] in vec_unpacks_hi_v4sf.

2024-11-24 Thread Hongtao Liu
On Fri, Nov 22, 2024 at 9:16 PM Richard Biener wrote: > > On Fri, 22 Nov 2024, liuhongt wrote: > > > It could cause weired spill in RA when register pressure is high. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ok for trunk? > > > > BTW, It's difficult to get a decent tes

Re: [PATCH] [RFC] Add extra 64bit SSE vector epilogue in some cases

2024-11-24 Thread Hongtao Liu
On Fri, Nov 22, 2024 at 9:33 PM Richard Biener wrote: > > Similar to the X86_TUNE_AVX512_TWO_EPILOGUES tuning which enables > an extra 128bit SSE vector epilouge when doing 512bit AVX512 > vectorization in the main loop the following allows a 64bit SSE > vector epilogue to be generated when the pr

Re: [PATCH] i386/testsuite: Do not append AVX10.2 option for check_effective_target

2024-11-21 Thread Hongtao Liu
On Fri, Nov 22, 2024 at 2:40 PM Haochen Jiang wrote: > > Hi all, > > When -avx10.2 meet -march with AVX512 enabled, it will report warning > for vector size conflict. The warning will prevent the test to run on > GCC with arch native build on those platforms when > check_effective_target. > > Remo

Re: [PATCH] i386/testsuite: Enhance AVX10.2 vmovd/w testcases

2024-11-20 Thread Hongtao Liu
On Thu, Nov 21, 2024 at 2:40 PM Haochen Jiang wrote: > > Hi all, > > Under -fno-omit-frame-pointer, %ebp will be used, which is the > Solaris/x86 default. Both check %ebp and %esp to avoid error on that. > > Tested under -m32 w/ and w/o -fno-omit-frame-pointer. Ok for trunk? Ok. > > Thx, > Haochen

Re: [PATCH] i386: Fix cstorebf4 fp comparison operand [PR117495]

2024-11-13 Thread Hongtao Liu
On Wed, Nov 13, 2024 at 10:00 AM Hongyu Wang wrote: > > Hi, > > For cstorebf4 it uses comparison_operator for BFmode compare, which is > incorrect when directly uses ix86_expand_setcc as it does not canonicalize > the input comparison to correct the compare code by swapping operands. > Since the o

Re: [PATCH v2 ] i386: Add ix86_expand_integer_cst_argument

2024-11-12 Thread Hongtao Liu
On Wed, Nov 13, 2024 at 8:29 AM H.J. Lu wrote: > > On Wed, Nov 13, 2024 at 5:57 AM H.J. Lu wrote: > > > > On Tue, Nov 12, 2024 at 9:30 PM Richard Biener > > wrote: > > > > > > On Tue, Nov 12, 2024 at 1:49 PM H.J. Lu wrote: > > > > > > > > When passing 0xff as an unsigned char function argument,

Re: [PATCH 2/2] Add X86_TUNE_AVX512_TWO_EPILOGUES, enable for Zen4 and Zen5

2024-11-11 Thread Hongtao Liu
On Mon, Nov 11, 2024 at 8:20 PM Richard Biener wrote: > > The following adds X86_TUNE_AVX512_TWO_EPILOGUES tuning and directs the > vectorizer to produce both a vector AVX2 and SSE epilogue for AVX512 > vectorized loops when set. The tuning is enabled by default for Zen4 > and Zen5 where I benchm

Re: [PATCH] Guard truncate from vector float to vector __bf16 with !flag_rounding_math && HONOR_NANS (BFmode).

2024-11-10 Thread Hongtao Liu
On Fri, Nov 8, 2024 at 10:33 AM liuhongt wrote: > > hw instruction doesn't raise exceptions, turns sNAN into qNAN quietly, > and always round to nearest (even). Output denormals are always > flushed to zero and input denormals are always treated as zero. MXCSR > is not consulted nor updated. > W/o

Re: [PATCH] i386: Disallow long address mode in the x32 mode. [PR 117418]

2024-11-08 Thread Hongtao Liu
On Fri, Nov 8, 2024 at 3:18 PM Uros Bizjak wrote: > > On Fri, Nov 8, 2024 at 6:52 AM Hongtao Liu wrote: > > > > > > PR target/117418 > > > > > * config/i386/i386-options.cc > > > > > (ix86_option_override_internal): raise

Re: [PATCH] i386: Disallow long address mode in the x32 mode. [PR 117418]

2024-11-07 Thread Hongtao Liu
On Fri, Nov 8, 2024 at 1:21 PM Hongtao Liu wrote: > > On Fri, Nov 8, 2024 at 12:18 PM H.J. Lu wrote: > > > > On Fri, Nov 8, 2024 at 10:41 AM Hu, Lin1 wrote: > > > > > > Hi, all > > > > > > -maddress-mode=long will let Pmode = DI_mode, but -

Re: [PATCH] i386: Disallow long address mode in the x32 mode. [PR 117418]

2024-11-07 Thread Hongtao Liu
On Fri, Nov 8, 2024 at 12:18 PM H.J. Lu wrote: > > On Fri, Nov 8, 2024 at 10:41 AM Hu, Lin1 wrote: > > > > Hi, all > > > > -maddress-mode=long will let Pmode = DI_mode, but -mx32 request x32 ABI. > > So raise an error to avoid ICE. > > > > Bootstrapped and regtested, OK for trunk? > > > > BRs, >

Re: [PATCH] [x86_64] Add microarchtecture tunable for pass_align_tight_loops

2024-11-07 Thread Hongtao Liu
On Fri, Nov 8, 2024 at 10:21 AM Mayshao-oc wrote: > > > > -Original Message- > > > From: Xi Ruoyao > > > Sent: Thursday, November 7, 2024 1:12 PM > > > To: Liu, Hongtao ; Mayshao-oc > > o...@zhaoxin.com>; Hongtao Liu > > > Cc: g

Re: [PATCH v4 7/8] i386: Add zero maskload else operand.

2024-11-07 Thread Hongtao Liu
On Fri, Nov 8, 2024 at 1:58 AM Robin Dapp wrote: > > From: Robin Dapp > > gcc/ChangeLog: > > * config/i386/sse.md (maskload): > Call maskload..._1. > (maskload_1): Rename. Ok for x86 part. > --- > gcc/config/i386/sse.md | 21 ++--- > 1 file changed, 18 ins

Re: [PATCH 1/2] [x86] Support vector float_truncate for SF to BF.

2024-11-07 Thread Hongtao Liu
On Thu, Nov 7, 2024 at 3:52 PM Jakub Jelinek wrote: > > On Thu, Nov 07, 2024 at 01:57:21PM +0800, Hongtao Liu wrote: > > > Does it turn the sNaNs into infinities or qNaNs silently? > > Yes. > > Into infinities? Into qNaNs(Sorry, I didn't see it clea

Re: [PATCH] i386: Add -mavx512vl for pr117304-1.c

2024-11-06 Thread Hongtao Liu
On Thu, Nov 7, 2024 at 2:04 PM Hu, Lin1 wrote: > > > -Original Message- > > From: Liu, Hongtao > > Sent: Thursday, November 7, 2024 11:41 AM > > To: Hu, Lin1 ; gcc-patches@gcc.gnu.org > > Cc: ubiz...@gmail.com > > Subject: RE: [PATCH] i386: Add -mavx512vl for pr117304-1.c > > > > > > > >

Re: [PATCH 1/2] [x86] Support vector float_truncate for SF to BF.

2024-11-06 Thread Hongtao Liu
On Tue, Nov 5, 2024 at 5:19 PM Jakub Jelinek wrote: > > On Tue, Nov 05, 2024 at 05:12:56PM +0800, Hongtao Liu wrote: > > Yes, there's a mismatch between scalar and vector code, I assume users > > may not care much about precision/NAN/INF/denormal behaviors for > >

Re: [PATCH] [x86_64] Add microarchtecture tunable for pass_align_tight_loops

2024-11-06 Thread Hongtao Liu
On Thu, Nov 7, 2024 at 10:29 AM MayShao-oc wrote: > > Hi all: >For zhaoxin, I find no improvement when enable pass_align_tight_loops, > and have performance drop in some cases. >This patch add a new tunable to bypass pass_align_tight_loops in zhaoxin. > >Bootstrapped X86_64. >Ok fo

Re: [PATCH] testsuite: Fix up pr116725.c test [PR116725]

2024-11-06 Thread Hongtao Liu
On Wed, Nov 6, 2024 at 4:59 PM Jakub Jelinek wrote: > > On Fri, Oct 18, 2024 at 02:05:59PM -0400, Antoni Boucher wrote: > > PR target/116725 > > * gcc.target/i386/pr116725.c: Add test using those AVX builtins. > > This test FAILs for me, as I don't have the latest gas aroun

Re: [PATCH] i386: Add OPTION_MASK_ISA2_EVEX512 for some AVX512 instructions.

2024-11-05 Thread Hongtao Liu
On Wed, Nov 6, 2024 at 10:35 AM Hu, Lin1 wrote: > > Hi, all > > This patch aims to add OPTION_MASK_ISA2_EVEX512 for all avx512 512-bits > builtin functions, raise error when these builtin functions are used with > -mno-evex512. > > Bootstrapped and Regtested on x86-64-pc-linux-gnu, OK for trunk an

Re: [PATCH] [x86_64] Add flag to control tight loops alignment opt

2024-11-05 Thread Hongtao Liu
On Tue, Nov 5, 2024 at 5:50 PM Mayshao-oc wrote: > > > > > > > > On Tue, Nov 5, 2024 at 2:34 PM Liu, Hongtao wrote: > > > > > > > > > > > > > -Original Message- > > > > From: MayShao-oc > > > > Sent: Tuesday, November 5, 2024 11:20 AM > > > > To: gcc-patches@gcc.gnu.org; hubi...@ucw.cz;

Re: [PATCH] gcc.target/i386/apx-ndd.c: Also scan (%edi)

2024-11-05 Thread Hongtao Liu
On Wed, Nov 6, 2024 at 8:19 AM H.J. Lu wrote: > > Since x32 uses (%edi), instead of (%rdi), also scan (%edi). > > * gcc.target/i386/apx-ndd.c: Also scan (%edi). Ok. > > -- > H.J. -- BR, Hongtao

Re: [PATCH] Intel MOVRS tests: Also scan (%e.x)

2024-11-05 Thread Hongtao Liu
On Wed, Nov 6, 2024 at 8:21 AM H.J. Lu wrote: > > Since x32 uses (%reg32), instead of (%r.x), also scan (%e.x). > > * gcc.target/i386/avx10_2-512-movrs-1.c: Also scan (%e.x). > * gcc.target/i386/avx10_2-movrs-1.c: Likewise. > * gcc.target/i386/movrs-1.c: Likewise. Ok. > > -- > H.J. -- BR, Hong

Re: [PATCH] [x86_64] Add flag to control tight loops alignment opt

2024-11-05 Thread Hongtao Liu
On Tue, Nov 5, 2024 at 5:33 PM Richard Biener wrote: > > On Tue, Nov 5, 2024 at 8:12 AM Hongtao Liu wrote: > > > > On Tue, Nov 5, 2024 at 2:34 PM Liu, Hongtao wrote: > > > > > > > > > > > > > -Original Message- > > > &g

Re: [PATCH 1/2] [x86] Support vector float_truncate for SF to BF.

2024-11-05 Thread Hongtao Liu
On Tue, Nov 5, 2024 at 4:46 PM Jakub Jelinek wrote: > > On Tue, Oct 29, 2024 at 07:19:38PM -0700, liuhongt wrote: > > Generate native instruction whenever possible, otherwise use vector > > permutation with odd indices. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ready pu

Re: [PATCH] [x86_64] Add flag to control tight loops alignment opt

2024-11-04 Thread Hongtao Liu
On Tue, Nov 5, 2024 at 2:34 PM Liu, Hongtao wrote: > > > > > -Original Message- > > From: MayShao-oc > > Sent: Tuesday, November 5, 2024 11:20 AM > > To: gcc-patches@gcc.gnu.org; hubi...@ucw.cz; Liu, Hongtao > > ; ubiz...@gmail.com > > Cc: ti...@zhaoxin.com; silviaz...@zhaoxin.com; loui..

Re: [PATCH v2] i386: Handling exception input of __builtin_ia32_prefetch. [PR117416]

2024-11-04 Thread Hongtao Liu
On Tue, Nov 5, 2024 at 2:41 PM Hu, Lin1 wrote: > > > -Original Message- > > From: Hu, Lin1 > > Sent: Tuesday, November 5, 2024 1:34 PM > > To: gcc-patches@gcc.gnu.org > > Cc: Liu, Hongtao ; ubiz...@gmail.com > > Subject: [PATCH v2] i386: Handling exception input of > > __builtin_ia32_pref

Re: [PATCH] i386: Handling exception input of __builtin_ia32_prefetch. [PR117416]

2024-11-04 Thread Hongtao Liu
On Tue, Nov 5, 2024 at 10:52 AM Hu, Lin1 wrote: > > Hi, all > > __builtin_ia32_prefetch's op1 should be between 0 and 2. So add an error > handler. > > Bootstrapped and regtested on x86_64-pc-linux-gnu, there is a unrelated FAIL > that has yet to be found root cause, just send patch for review. >

Re: [PATCH 0/2] Add arch support for Intel CPUs

2024-11-04 Thread Hongtao Liu
On Fri, Nov 1, 2024 at 11:24 AM Haochen Jiang wrote: > > Hi all, > > I have just landed new ISA patches on trunk. The next step will > be the arch support for ISE055 mentioned CPUs. > > There are two changes in ISE055 on CPUs: > > - A new model number is added for Arrow Lake. > - Diamond Rapid

Re: [PATCH] i386: Utilize VCOMSBF16 for BF16 Comparisons with AVX10.2

2024-11-03 Thread Hongtao Liu
On Fri, Nov 1, 2024 at 8:33 AM Hongyu Wang wrote: > > From: Levy Hsu > > This patch enables the use of the VCOMSBF16 instruction from AVX10.2 for > efficient BF16 comparisons. > > Bootstrapped & regtested on x86-64-pc-linux-gnu. > Ok for trunk? Ok. > > gcc/ChangeLog: > > * config/i386/i38

Re: [PATCH v3 7/8] i386: Add else operand to masked loads.

2024-11-03 Thread Hongtao Liu
On Sat, Nov 2, 2024 at 8:58 PM Robin Dapp wrote: > > From: Robin Dapp > > This patch adds a zero else operand to masked loads, in particular the > masked gather load builtins that are used for gather vectorization. > > gcc/ChangeLog: > > * config/i386/i386-expand.cc (ix86_expand_special_a

Re: [PATCH] [APX PPX] Avoid generating unmatched pushp/popp in pro/epilogue

2024-10-30 Thread Hongtao Liu
On Thu, Jul 4, 2024 at 11:00 AM Hongtao Liu wrote: > > On Tue, Jul 2, 2024 at 11:24 AM Hongyu Wang wrote: > > > > Hi, > > > > According to APX spec, the pushp/popp pairs should be matched, > > otherwise the PPX hint cannot take effect and ca

Re: [PATCH v2 7/8] i386: Add else operand to masked loads.

2024-10-29 Thread Hongtao Liu
On Fri, Oct 18, 2024 at 10:23 PM Robin Dapp wrote: > > This patch adds a zero else operand to masked loads, in particular the > masked gather load builtins that are used for gather vectorization. > > gcc/ChangeLog: > > * config/i386/i386-expand.cc (ix86_expand_special_args_builtin): >

Re: [PATCH] testsuite: Adjust AVX10.2 check_effective_target

2024-10-29 Thread Hongtao Liu
On Tue, Oct 29, 2024 at 5:04 PM Haochen Jiang wrote: > > Hi all, > > Since Binutils haven't fully merged all AVX10.2 insts, only testing > one inst/intrin in AVX10.2 is never sufficient for check_effective_target. > Like APX_F, use inline asm to do the target check. > > Testes w/ and w/o Binutils

Re: [PATCH 0/7] Support Intel Diamond Rapid new features

2024-10-28 Thread Hongtao Liu
On Tue, Oct 22, 2024 at 2:31 PM Haochen Jiang wrote: > > Hi all, > > ISE054 has just been released and you can find doc from here: > > https://cdrdv2.intel.com/v1/dl/getContent/671368 > > Diamond Rapids features are added in this ISE, including AMX > related instructions, SM4 EVEX extension and MO

Re: [PATCH] target: Fix asm codegen for vfpclasss* and vcvtph2* instructions

2024-10-24 Thread Hongtao Liu
On Fri, Oct 25, 2024 at 12:19 AM Antoni Boucher wrote: > > Thanks. > Did you review the new patch? > Can I push it to master? Ok. > > Le 2024-10-20 à 22 h 01, Hongtao Liu a écrit : > > On Sat, Oct 19, 2024 at 2:06 AM Antoni Boucher wrote: > >> > >> Than

Re: [PATCH] target: Fix asm codegen for vfpclasss* and vcvtph2* instructions

2024-10-20 Thread Hongtao Liu
On Sat, Oct 19, 2024 at 2:06 AM Antoni Boucher wrote: > > Thanks for the review. > Here's the updated patch. > > Le 2024-10-17 à 21 h 50, Hongtao Liu a écrit : > > On Fri, Oct 18, 2024 at 9:08 AM Antoni Boucher wrote: > >> > >> Hi. > >> This i

Re: [PATCH] target: Fix asm codegen for vfpclasss* and vcvtph2* instructions

2024-10-17 Thread Hongtao Liu
On Fri, Oct 18, 2024 at 9:08 AM Antoni Boucher wrote: > > Hi. > This is a patch for the bug 116725. > I'm not sure if it is a good fix, but it seems to do the job. > If you have suggestions for better comments than what I wrote that would > explain what's happening, I'm open to suggestions. >@@ -

Re: [PATCH] testsuite: Fix typos for AVX10.2 convert testcases

2024-10-17 Thread Hongtao Liu
On Thu, Oct 17, 2024 at 3:17 PM Haochen Jiang wrote: > > From: Victor Rodriguez > > Hi all, > > There are some typos in AVX10.2 vcvtne[,2]ph[b,h]f8[,s] testcases. > They will lead to type mismatch. > > Previously they are not found due to the binutils did not checkin. > > Ok for trunk? Ok. > > Th

Re: [PATCH] [RFC] target/117072 - more RTL FMA canonicalization

2024-10-14 Thread Hongtao Liu
On Mon, Oct 14, 2024 at 1:50 PM Richard Biener wrote: > > On Mon, 14 Oct 2024, Hongtao Liu wrote: > > > On Sun, Oct 13, 2024 at 8:02 PM Richard Biener wrote: > > > > > > On Sun, 13 Oct 2024, Hongtao Liu wrote: > > > > > > &

Re: [PATCH] [RFC] target/117072 - more RTL FMA canonicalization

2024-10-13 Thread Hongtao Liu
On Sun, Oct 13, 2024 at 8:02 PM Richard Biener wrote: > > On Sun, 13 Oct 2024, Hongtao Liu wrote: > > > On Fri, Oct 11, 2024 at 8:33 PM Hongtao Liu wrote: > > > > > > On Fri, Oct 11, 2024 at 8:22 PM Richard Biener wrote: > > > > > > > > T

Re: [PATCH] [RFC] target/117072 - more RTL FMA canonicalization

2024-10-13 Thread Hongtao Liu
On Fri, Oct 11, 2024 at 8:33 PM Hongtao Liu wrote: > > On Fri, Oct 11, 2024 at 8:22 PM Richard Biener wrote: > > > > The following helps the x86 backend by canonicalizing FMAs to have > > any negation done to one of the commutative multiplication operands > > be

Re: [PATCH] [RFC] target/117072 - more RTL FMA canonicalization

2024-10-11 Thread Hongtao Liu
On Fri, Oct 11, 2024 at 8:22 PM Richard Biener wrote: > > The following helps the x86 backend by canonicalizing FMAs to have > any negation done to one of the commutative multiplication operands > be done to a register (and not a memory operand). Likewise to > put a register operand first and a m

Re: [PATCH] x86: Implement Fast-Math Float Truncation to BF16 via PSRLD Instruction

2024-10-09 Thread Hongtao Liu
On Tue, Oct 8, 2024 at 3:24 PM Levy Hsu wrote: > > Bootstrapped and tested on x86_64-linux-gnu, OK for trunk? Ok. > > gcc/ChangeLog: > > * config/i386/i386.md: Rewrite insn truncsfbf2. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/truncsfbf-1.c: New test. > * gcc.targe

Re: [PATCH v2 2/2] Adjust testcase after relax O2 vectorization.

2024-10-08 Thread Hongtao Liu
On Tue, Oct 8, 2024 at 4:56 PM Richard Biener wrote: > > On Tue, Oct 8, 2024 at 10:36 AM liuhongt wrote: > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/fstack-protector-strong.c: Adjust > > scan-assembler-times. > > * gcc.dg/graphite/scop-6.c: Add > > -Wno-aggre

Re: [PATCH v2] x86/{,V}AES: adjust when to force EVEX encoding

2024-10-08 Thread Hongtao Liu
On Tue, Oct 8, 2024 at 3:00 PM Jan Beulich wrote: > > On 08.10.2024 08:54, Hongtao Liu wrote: > > On Mon, Sep 30, 2024 at 3:33 PM Jan Beulich wrote: > >> > >> Commit a79d13a01f8c ("i386: Fix aes/vaes patterns [PR114576]") correctly > >> sa

Re: [PATCH v2] x86/{,V}AES: adjust when to force EVEX encoding

2024-10-07 Thread Hongtao Liu
On Mon, Sep 30, 2024 at 3:33 PM Jan Beulich wrote: > > Commit a79d13a01f8c ("i386: Fix aes/vaes patterns [PR114576]") correctly > said "..., but we need to emit {evex} prefix in the assembly if AES ISA > is not enabled". Yet it did so only for the TARGET_AES insns. Going from > the alternative cho

  1   2   3   4   5   6   7   8   9   10   >