Re: [PATCH] x86-64: Remove redundant TLS calls

2025-07-20 Thread Hongtao Liu
On Thu, Jul 17, 2025 at 11:22 PM H.J. Lu wrote: > > For TLS calls: > > 1. UNSPEC_TLS_GD: > > (parallel [ > (set (reg:DI 0 ax) > (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) > (const_int 0 [0]))) > (unspec:DI [(symbol_ref:DI ("e") [flags 0x50]) >

Re: [PATCH] x86: Don't change mode for XOR in ix86_expand_ternlog

2025-07-16 Thread Hongtao Liu
On Thu, Jul 17, 2025 at 9:43 AM H.J. Lu wrote: > > There is no need to change mode for XOR in ix86_expand_ternlog now. > Whatever reasons for it in the first place no longer exist. Tested > on x86-64 with -m32. There are no regressions. Ok. > > * config/i386/i386.cc (ix86_expand_ternlog)

Re: [PATCH] i386: Decouple AMX-AVX512 from AVX10.2 and imply AVX512F

2025-07-15 Thread Hongtao Liu
On Tue, Jul 15, 2025 at 2:36 PM Haochen Jiang wrote: > > Hi all, > > In ISE058, the AVX10.2 imply is removed from AMX-AVX512. This > leads to re-consideration on the imply for AMX-AVX512. > > Since it is using zmm register and using zmm register only, we > need to at least imply AVX512F. AVX512VL

Re: [PATCH v3] x86: Improve vector_loop/unrolled_loop for memset/memcpy

2025-07-07 Thread Hongtao Liu
On Mon, Jul 7, 2025 at 3:27 PM Hongtao Liu wrote: > > On Tue, Jun 24, 2025 at 2:11 PM H.J. Lu wrote: > > > > On Mon, Jun 23, 2025 at 2:24 PM H.J. Lu wrote: > > > > > > On Wed, Jun 18, 2025 at 3:17 PM H.J. Lu wrote: > > > > > > > >

Re: [PATCH v3] x86: Improve vector_loop/unrolled_loop for memset/memcpy

2025-07-07 Thread Hongtao Liu
On Tue, Jun 24, 2025 at 2:11 PM H.J. Lu wrote: > > On Mon, Jun 23, 2025 at 2:24 PM H.J. Lu wrote: > > > > On Wed, Jun 18, 2025 at 3:17 PM H.J. Lu wrote: > > > > > > 1. Don't generate the loop if the loop count is 1. > > > 2. For memset with vector on small size, use vector if small size supports

Re: [PATCH 2/2] add masked-epilogue tuning

2025-07-07 Thread Hongtao Liu
On Mon, Jul 7, 2025 at 3:18 PM Hongtao Liu wrote: > > On Fri, Jul 4, 2025 at 5:45 PM Richard Biener wrote: > > > > The following adds a x86 tuning to enable the use of AVX512 masked > > epilogues in cases we heuristically determine it to be not detrimental > &

Re: [PATCH 2/2] add masked-epilogue tuning

2025-07-07 Thread Hongtao Liu
On Fri, Jul 4, 2025 at 5:45 PM Richard Biener wrote: > > The following adds a x86 tuning to enable the use of AVX512 masked > epilogues in cases we heuristically determine it to be not detrimental > by high chance. Basically problematic cases are when there are > data streams that are both stored

Re: [PATCH v2] x86: Preserve frame pointer for no_callee_saved_registers attribute

2025-06-29 Thread Hongtao Liu
On Mon, Jun 30, 2025 at 11:46 AM H.J. Lu wrote: > > On Mon, Jun 30, 2025 at 11:17 AM H.J. Lu wrote: > > > > On Mon, Jun 30, 2025 at 10:41 AM Hongtao Liu wrote: > > > > > > On Mon, Jun 30, 2025 at 10:37 AM Hongtao Liu wrote: > > > > > &

Re: [PATCH] x86: Preserve frame pointer for no_callee_saved_registers attribute

2025-06-29 Thread Hongtao Liu
On Mon, Jun 30, 2025 at 11:16 AM H.J. Lu wrote: > > On Mon, Jun 30, 2025 at 10:37 AM Hongtao Liu wrote: > > > > On Sat, Jun 28, 2025 at 8:30 PM H.J. Lu wrote: > > > > > > Update functions with no_callee_saved_registers/preserve_none attribute > > > t

Re: [PATCH] x86: Preserve frame pointer for no_callee_saved_registers attribute

2025-06-29 Thread Hongtao Liu
On Sat, Jun 28, 2025 at 8:30 PM H.J. Lu wrote: > > Update functions with no_callee_saved_registers/preserve_none attribute > to preserve frame pointer since caller may use it to save the current > stack: > > pushq %rbp > movq %rsp, %rbp > ... > call function > ... > leave > ret > > If callee chang

Re: [PATCH] x86: Preserve frame pointer for no_callee_saved_registers attribute

2025-06-29 Thread Hongtao Liu
On Mon, Jun 30, 2025 at 10:37 AM Hongtao Liu wrote: > > On Sat, Jun 28, 2025 at 8:30 PM H.J. Lu wrote: > > > > Update functions with no_callee_saved_registers/preserve_none attribute > > to preserve frame pointer since caller may use it to save the current > > stac

Re: [PATCH] x86: Handle vector broadcast source

2025-06-26 Thread Hongtao Liu
On Thu, Jun 26, 2025 at 2:17 PM H.J. Lu wrote: > > On Thu, Jun 26, 2025 at 2:11 PM Hongtao Liu wrote: > > > > On Thu, Jun 26, 2025 at 1:59 PM H.J. Lu wrote: > > > > > > Use the inner scalar mode of vector broadcast source in: > > > > > >

Re: [PATCH] x86: Also handle all 1s float vector constant

2025-06-25 Thread Hongtao Liu
On Thu, Jun 26, 2025 at 2:02 PM H.J. Lu wrote: > > Since float vector constant > > (const_vector:V4SF [(const_double:SF -QNaN [-QNaN]) repeated x4]) > > is an all 1s float vector constant, update the remove_redundant_vector > pass to replace > > (insn 20 18 21 2 (set (reg:V4SF 124) > (cons

Re: [PATCH] x86: Handle vector broadcast source

2025-06-25 Thread Hongtao Liu
On Thu, Jun 26, 2025 at 1:59 PM H.J. Lu wrote: > > Use the inner scalar mode of vector broadcast source in: > > (set (reg:V8DF 394) >(vec_duplicate:V8DF (reg:V2DF 190 [ alpha ]))) > > to compute the vector mode for broadcast from vector source. ix86_get_vector_cse_mode (unsigned int si

Re: [PATCH] x86: Handle REG_EH_REGION note in DEF_INSN

2025-06-25 Thread Hongtao Liu
On Thu, Jun 26, 2025 at 1:56 PM H.J. Lu wrote: > > On Thu, Jun 26, 2025 at 1:24 PM Hongtao Liu wrote: > > > > On Thu, Jun 26, 2025 at 6:20 AM H.J. Lu wrote: > > > > > > For tcpsock_test.go in libgo tests, > > > > > > commit aba3b9d3

Re: [PATCH] x86: Handle REG_EH_REGION note in DEF_INSN

2025-06-25 Thread Hongtao Liu
On Thu, Jun 26, 2025 at 6:20 AM H.J. Lu wrote: > > For tcpsock_test.go in libgo tests, > > commit aba3b9d3a48a0703fd565f7c5f0caf604f59970b > Author: H.J. Lu > Date: Fri May 9 07:17:07 2025 +0800 > > x86: Extend the remove_redundant_vector pass > > added an instruction: > > (insn 501 101 102

Re: [PATCH v3] x86: Add preserve_none and update no_caller_saved_registers attributes

2025-06-25 Thread Hongtao Liu
On Wed, Jun 25, 2025 at 3:35 PM H.J. Lu wrote: > > Add preserve_none attribute which is similar to no_callee_saved_registers > attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are > used for integer parameter passing. This can be used in an interpreter > to avoid saving/rest

Re: [PATCH] x86: Add debug dump for the remove_redundant_vector pass

2025-06-25 Thread Hongtao Liu
On Thu, Jun 26, 2025 at 6:21 AM H.J. Lu wrote: > > On Tue, Jun 24, 2025 at 2:21 PM H.J. Lu wrote: > > > > Add debug dump for the remove_redundant_vector pass with the following > > output: > > > > Replace: > > > > (insn 7 4 8 2 (set (reg:V2DI 103) > > (const_vector:V2DI [ > >

Re: [PATCH v3] x86: Update memcpy/memset inline strategies for -mtune=generic

2025-06-25 Thread Hongtao Liu
On Tue, Jun 17, 2025 at 8:54 PM Cui, Lili wrote: > > > > > -Original Message- > > From: H.J. Lu > > Sent: Monday, June 16, 2025 10:08 PM > > To: Jan Hubicka > > Cc: Uros Bizjak ; Cui, Lili ; gcc- > > patc...@gcc.gnu.org; Liu, Hongtao ; > > mjgu...@gmail.com > > Subject: [PATCH v3] x86: U

Re: [PATCH v2] x86: Add preserve_none and update no_caller_saved_registers attributes

2025-06-24 Thread Hongtao Liu
On Fri, May 23, 2025 at 1:56 PM H.J. Lu wrote: > > Add preserve_none attribute which is similar to no_callee_saved_registers > attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are > used for integer parameter passing. This can be used in an interpreter > to avoid saving/rest

Re: [PATCH] x86: Update -mtune=intel for Diamond Rapids/Clearwater Forest

2025-06-24 Thread Hongtao Liu
On Wed, Jun 25, 2025 at 1:06 PM H.J. Lu wrote: > > -mtune=intel is used to generate a single binary to run well on both big > core and small core, similar to hybrid CPUs. Update -mtune=intel to tune > for Diamond Rapids and Clearwater Forest, instead of Silvermont. > > PR target/120815 > * common

Re: [PATCH v4] x86: Extend the remove_redundant_vector pass

2025-06-24 Thread Hongtao Liu
On Tue, Jun 24, 2025 at 1:26 PM H.J. Lu wrote: > > On Mon, Jun 23, 2025 at 4:53 PM Hongtao Liu wrote: > > > > On Mon, Jun 23, 2025 at 4:45 PM H.J. Lu wrote: > > > > > > On Mon, Jun 23, 2025 at 4:10 PM H.J. Lu wrote: > > > > > >

Re: [PATCH v3] x86: Extend the remove_redundant_vector pass

2025-06-23 Thread Hongtao Liu
On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu wrote: > > Extend the remove_redundant_vector pass to handle vector broadcasts from > constant and variable scalars. When broadcasting from constants and > function arguments, we can place a single widest vector broadcast at > entry of the nearest common d

Re: [PATCH v4] x86: Extend the remove_redundant_vector pass

2025-06-23 Thread Hongtao Liu
On Mon, Jun 23, 2025 at 4:45 PM H.J. Lu wrote: > > On Mon, Jun 23, 2025 at 4:10 PM H.J. Lu wrote: > > > > On Mon, Jun 23, 2025 at 3:11 PM Hongtao Liu wrote: > > > > > > On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu wrote: > > > > > > > &

Re: [PATCH v3] x86: Extend the remove_redundant_vector pass

2025-06-23 Thread Hongtao Liu
On Mon, Jun 23, 2025 at 4:10 PM H.J. Lu wrote: > > On Mon, Jun 23, 2025 at 3:11 PM Hongtao Liu wrote: > > > > On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu wrote: > > > > > > Extend the remove_redundant_vector pass to handle vector broadcasts from > &

Re: [PATCH v3] x86: Extend the remove_redundant_vector pass

2025-06-23 Thread Hongtao Liu
On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu wrote: > > Extend the remove_redundant_vector pass to handle vector broadcasts from > constant and variable scalars. When broadcasting from constants and > function arguments, we can place a single widest vector broadcast at > entry of the nearest common d

Re: [PATCH v2] x86: Don't use vmovdqu16/vmovdqu8 with non-EVEX registers

2025-06-22 Thread Hongtao Liu
On Sat, Jun 21, 2025 at 11:09 PM H.J. Lu wrote: > > On Fri, Jun 20, 2025 at 4:12 PM H.J. Lu wrote: > > > > Don't use vmovdqu16/vmovdqu8 with non-EVEX registers even if AVX512BW is > > available. > > > > gcc/ > > > > PR target/120728 > > * config/i386/i386.cc (ix86_get_ssemov): Use vmovdqu16/vmovd

Re: [PATCH] x86: Add PROCESSOR_XXX comments to processor_cost_table

2025-06-22 Thread Hongtao Liu
On Mon, Jun 23, 2025 at 11:03 AM H.J. Lu wrote: > > Add a PROCESSOR_XXX comment to each entry in processor_cost_table to > describe which processor the cost enry is applied to. Ok as obvious. > > * config/i386/i386-options.cc (processor_cost_table): Add a > PROCESSOR_XXX comment to each entry. > >

Re: [PATCH] i386: Remove CLDEMOTE for clients

2025-06-22 Thread Hongtao Liu
On Fri, Jun 20, 2025 at 10:04 AM Haochen Jiang wrote: > > Hi all, > > CLDEMOTE is not enabled on clients according to SDM. SDM only mentioned > it will be enabled on Xeon and Atom servers, not clients. Remove them > since Alder Lake (where it is introduced). > > Also will backport this patch to GC

Re: [PATCH v4] x86: Enable *mov_(and|or) only for -Oz

2025-06-19 Thread Hongtao Liu
On Wed, Jun 18, 2025 at 6:38 PM H.J. Lu wrote: > > commit ef26c151c14a87177d46fd3d725e7f82e040e89f > Author: Roger Sayle > Date: Thu Dec 23 12:33:07 2021 + > > x86: PR target/103773: Fix wrong-code with -Oz from pop to memory. > > added "*mov_and" and extended "*mov_or" to transform > "

Re: [PATCH v2 x86: Extend the remove_redundant_vector pass

2025-06-17 Thread Hongtao Liu
On Wed, Jun 18, 2025 at 2:39 PM H.J. Lu wrote: > > On Mon, Jun 16, 2025 at 4:14 PM Hongtao Liu wrote: > > > > >+enum redundant_load_kind > > >+{ > > >+ LOAD_CONST0_VECTOR, > > >+ LOAD_CONSTM1_VECTOR, > > >+ LOAD_VECTOR > >

Re: [PATCH v3] x86: Enable *mov_(and|or) only for -Oz

2025-06-17 Thread Hongtao Liu
On Mon, May 26, 2025 at 2:30 PM H.J. Lu wrote: > > On Sun, May 25, 2025 at 7:02 PM H.J. Lu wrote: > > > > On Sun, May 25, 2025 at 8:12 AM H.J. Lu wrote: > > > > > > On Sun, May 25, 2025 at 7:47 AM H.J. Lu wrote: > > > > > > > > commit ef26c151c14a87177d46fd3d725e7f82e040e89f > > > > Author: Rog

Re: [PATCH] [AUTOFDO] Don't scale bb_count with ipa_count when ipa_count is zero but count_max is not

2025-06-16 Thread Hongtao Liu
Drop this patch since https://gcc.gnu.org/pipermail/gcc-patches/2025-June/686830.html could be a better alternative. On Tue, Jun 10, 2025 at 9:50 AM Hongtao Liu wrote: > > Ping > > On Mon, May 19, 2025 at 10:06 AM liuhongt wrote: > > > > From: "hongtao.liu" &

Re: [PATCH v2 x86: Extend the remove_redundant_vector pass

2025-06-16 Thread Hongtao Liu
On Mon, Jun 16, 2025 at 4:30 PM Hongtao Liu wrote: > > >+enum redundant_load_kind > >+{ > >+ LOAD_CONST0_VECTOR, > >+ LOAD_CONSTM1_VECTOR, > >+ LOAD_VECTOR > >+}; > Perhaps rename to x86_cse_kind, X86_CSE_CONST0_VECTOR, > X86_CSE_CONSTM1_VECTOR, X

Re: [PATCH v2 x86: Extend the remove_redundant_vector pass

2025-06-16 Thread Hongtao Liu
>+enum redundant_load_kind >+{ >+ LOAD_CONST0_VECTOR, >+ LOAD_CONSTM1_VECTOR, >+ LOAD_VECTOR >+}; Perhaps rename to x86_cse_kind, X86_CSE_CONST0_VECTOR, X86_CSE_CONSTM1_VECTOR, X86_CSE_VEC_DUP? LOAD sounds a bit ambiguous. Similar to ix86_get_vector_load_mode -> ix86_get_vector_cse_mode? >+

Re: [PATCH] i386: Set SRF, GRR, CWF, GNR, DMR, ARL and PTL issue rate

2025-06-12 Thread Hongtao Liu
On Thu, Jun 12, 2025 at 10:51 AM Hu, Lin1 wrote: > > Hi, > > This patch aims to set SRF issue rate to 4, GNR issue rate to 6. According to > tests about spec2017, the patch has little effect on performance. > > For GRR, CWF, DMR, ARL and PTL, the patch set their issue rate to 6. Waiting > for > m

Re: [PATCH] [AUTOFDO] Don't scale bb_count with ipa_count when ipa_count is zero but count_max is not

2025-06-09 Thread Hongtao Liu
Ping On Mon, May 19, 2025 at 10:06 AM liuhongt wrote: > > From: "hongtao.liu" > > AutoFDO profile is a scaled profile, as a result, 0 sample does not > mean never executed. especially there's profile from function > body. Prevent combine_with_ipa_count·(ipa_count) from zeroing all > bb->count. >

Re: [PATCH] x86: Extend the remove_redundant_vector pass

2025-06-09 Thread Hongtao Liu
On Tue, Jun 3, 2025 at 2:59 PM H.J. Lu wrote: > > Extend the remove_redundant_vector pass to handle vector broadcasts from > constant and variable scalars. When broadcasting from constants and > function arguments, we can place a single widest vector broadcast at > entry of the nearest common dom

Re: [PATCH] i386: Add more peephole2 for APX NDD

2025-06-03 Thread Hongtao Liu
On Thu, May 29, 2025 at 4:56 PM Hu, Lin1 wrote: > > Hi, > > The patch aims to optimize > movb(%rdi), %al > movq%rdi, %rbx > xorl%esi, %eax, %edx > movb%dl, (%rdi) > cmpb%sil, %al > jne > to > xorb%sil, (%rdi) >

Re: [PATCH] i386: Add more forms peephole2 for adc/sbb

2025-06-03 Thread Hongtao Liu
On Mon, May 26, 2025 at 4:55 PM Hu, Lin1 wrote: > > Hi, all > > Enable -mapxf will change some patterns about adc/sbb. > > Hence gcc will raise an extra mov like > movq8(%rdi), %rax > adcq%rax, 8(%rsi), %rax > movq%rax, 8(%rdi) > rather than > movq

Re: [PATCH v2 0/7] Remove -mavx10.1-256/512 and -mno-evex512

2025-05-18 Thread Hongtao Liu
On Wed, May 14, 2025 at 3:29 PM Haochen Jiang wrote: > > Hi all, > > This is the v2 patch to remove -mavx10.1/256-512 and -mno-evex512. I suppose > this time all the patches will not be held due to size. > > As mentioned in GCC 15, we will remove -mavx10.1-256/512 and -mno-evex512 > options in GCC

Re: [PATCH] For datarefs with big gap, split them into different groups.

2025-05-15 Thread Hongtao Liu
It's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119181 On Fri, May 16, 2025 at 10:02 AM liuhongt wrote: > > The patch tries to solve miss vectorization for below case. > > void > foo (int* a, int* restrict b) > { > b[0] = a[0] * a[64]; > b[1] = a[65] * a[1]; > b[2] = a[2] * a[66]; >

Re: [PATCH] x86: Add preserve_none and update no_caller_saved_registers attributes

2025-05-13 Thread Hongtao Liu
On Fri, Apr 18, 2025 at 7:10 PM H.J. Lu wrote: > > Add preserve_none attribute which is similar to no_callee_saved_registers > attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are Could you split preserve_none into a separate patch, It looks like it's different from clang's p

Re: [PATCH] Update libbid according to the latest Intel Decimal Floating-Point Math Library.

2025-05-13 Thread Hongtao Liu
On Wed, May 14, 2025 at 9:22 AM liuhongt wrote: > > The Intel Decimal Floating-Point Math Library is available as open-source on > Netlib[1]. > > [1] https://www.netlib.org/misc/intel/ > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ready push to trunk. > > libgcc/config/libbid/Ch

Re: [PATCH v3] Consider frequency in cost estimation when converting scalar to vector.

2025-05-11 Thread Hongtao Liu
On Thu, May 8, 2025 at 2:40 PM liuhongt wrote: > > The only part I changed is related to size_cost of sse_to_ineteger, as below > > 114+ /* Under TARGET_SSE4_1, it's vmovd + vpextrd/vpinsrd. > 115+ W/o it, it's movd + psrlq/unpckldq + movd. */ > 116+ else if (!TARGET_64BIT && smode != SImod

Re: [PATCH v2] x86: Insert extra move for mode size smaller than natural size

2025-05-06 Thread Hongtao Liu
On Wed, May 7, 2025 at 9:06 AM H.J. Lu wrote: > > On Tue, May 6, 2025 at 3:35 PM Hongtao Liu wrote: > > > > On Tue, May 6, 2025 at 3:06 PM H.J. Lu wrote: > > > > > > On Tue, May 6, 2025 at 2:30 PM Liu, Hongtao wrote: > > > > > > > >

Re: [PATCH] x86: Skip if the mode size is smaller than its natural size

2025-05-06 Thread Hongtao Liu
On Tue, May 6, 2025 at 3:06 PM H.J. Lu wrote: > > On Tue, May 6, 2025 at 2:30 PM Liu, Hongtao wrote: > > > > > > > > > -Original Message- > > > From: H.J. Lu > > > Sent: Tuesday, May 6, 2025 2:16 PM > > > To: Liu, Hongtao > > > Cc: GCC Patches ; Uros Bizjak > > > > > > Subject: Re: [PA

Re: [PATCH] i386: Add ix86_expand_unsigned_small_int_cst_argument

2025-04-28 Thread Hongtao Liu
On Sun, Apr 27, 2025 at 10:58 AM H.J. Lu wrote: > > When passing 0xff as an unsigned char function argument with the C frontend > promotion, expand_normal used to get > > constant > 255> > > and returned the rtx value using the sign-extended representation: > > (const_int 255 [0xff]) > > But aft

Re: [PATCH v2] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-28 Thread Hongtao Liu
On Mon, Apr 28, 2025 at 5:07 PM H.J. Lu wrote: > > On Mon, Apr 28, 2025 at 4:26 PM H.J. Lu wrote: > > > > > > > This is what my patch does: > > > But it iterates through vector_insns, using a def-ref chain to find > > > those insns. I think we can just record those single_set with src as > > > co

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-25 Thread Hongtao Liu
> > I am not so sure about this when it come to relatively common > instructions. Hiding things in unspec prevents combine and other RTL > passes from doing their job. I would say that it only makes sense for > siutations where RTL equivalent is very inconvenient. > In the direction of using gener

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-24 Thread Hongtao Liu
On Fri, Apr 25, 2025 at 1:26 PM Jan Hubicka wrote: > > > On Thu, Apr 24, 2025 at 6:27 PM Jan Hubicka wrote: > > > > > > > Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand > > > > or vpandn. > > > > Current register_operand/vector_operand could lose some optimization > > >

Re: [PATCH] [x86] Generate 2 FMA instructions in ix86_expand_swdivsf.

2025-04-23 Thread Hongtao Liu
On Thu, Apr 24, 2025 at 12:54 AM Jan Hubicka wrote: > > > From: "hongtao.liu" > > > > When FMA is available, N-R step can be rewritten with > > > > a / b = (a - (rcp(b) * a * b)) * rcp(b) + rcp(b) * a > > > > which have 2 fma generated.[1] > > > > [1] https://bugs.llvm.org/show_bug.cgi?id=21385 >

Re: [PATCH] Consider frequency in cost estimation when converting scalar to vector.

2025-04-23 Thread Hongtao Liu
On Thu, Apr 24, 2025 at 12:50 AM Jan Hubicka wrote: > > > In some benchmark, I notice stv failed due to cost unprofitable, but the > > igain > > is inside the loop, but sse<->integer conversion is outside the loop, > > current cost > > model doesn't consider the frequency of those gain/cost. > >

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-22 Thread Hongtao Liu
On Mon, Apr 21, 2025 at 2:52 PM liuhongt wrote: > > Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand > or vpandn. > Current register_operand/vector_operand could lose some optimization > opportunity. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for tru

Re: Improve vectorizer costs of min, max, abs, absu and const_expr on x86

2025-04-21 Thread Hongtao Liu
On Tue, Apr 22, 2025 at 10:30 AM Hongtao Liu wrote: > > On Tue, Apr 22, 2025 at 12:46 AM Jan Hubicka wrote: > > > > Hi, > > this patch adds special cases for vectorizer costs in COND_EXPR, MIN_EXPR, > > MAX_EXPR, ABS_EXPR and ABSU_EXPR. We previously costed ABS_E

Re: Improve vectorizer costs of min, max, abs, absu and const_expr on x86

2025-04-21 Thread Hongtao Liu
On Tue, Apr 22, 2025 at 12:46 AM Jan Hubicka wrote: > > Hi, > this patch adds special cases for vectorizer costs in COND_EXPR, MIN_EXPR, > MAX_EXPR, ABS_EXPR and ABSU_EXPR. We previously costed ABS_EXPR and > ABSU_EXPR > but it was only correct for FP variant (wehre it corresponds to andss clea

Re: PING: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-21 Thread Hongtao Liu
On Mon, Apr 21, 2025 at 4:30 PM H.J. Lu wrote: > > On Mon, Apr 21, 2025 at 11:29 AM Hongtao Liu wrote: > > > > On Sat, Apr 19, 2025 at 1:25 PM H.J. Lu wrote: > > > > > > On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote: > > > > > > > >

Re: PING: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-20 Thread Hongtao Liu
On Sat, Apr 19, 2025 at 1:25 PM H.J. Lu wrote: > > On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote: > > > > For all different modes of all 0s/1s vectors, we can use the single widest > > all 0s/1s vector register for all 0s/1s vector uses in the whole function. > > Add a pass to generate a single wi

Re: [PATCH v2] x86: Update memcpy/memset inline strategies for -mtune=generic

2025-04-17 Thread Hongtao Liu
On Tue, Apr 8, 2025 at 3:52 AM H.J. Lu wrote: > > Simplify memcpy and memset inline strategies to avoid branches for > -mtune=generic: > > 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector >load and store for up to 16 * 16 (256) bytes when the data size is >fixed and kn

Re: [PATCH] APX: Don't use red-zone with APX and no caller-saved registers

2025-04-14 Thread Hongtao Liu
On Mon, Apr 14, 2025 at 8:56 PM H.J. Lu wrote: > > On Mon, Apr 14, 2025 at 2:39 AM Uros Bizjak wrote: > > > > On Mon, Apr 14, 2025 at 8:54 AM Hongtao Liu wrote: > > > > > > On Mon, Apr 14, 2025 at 7:36 AM H.J. Lu wrote: > > > > > >

Re: [PATCH] APX: Don't use red-zone with APX and no caller-saved registers

2025-04-13 Thread Hongtao Liu
On Mon, Apr 14, 2025 at 7:36 AM H.J. Lu wrote: > > Don't use red-zone when there are no caller-saved registers and APX is > enabled since 128-byte red-zone is too small for 31 GPRs. > > gcc/ > > PR target/119784 > * config/i386/i386.cc (ix86_using_red_zone): Don't use red-zone >

Re: [PATCH] target/119549 - fixup handling of -mno-sse4

2025-04-04 Thread Hongtao Liu
On Mon, Mar 31, 2025 at 9:52 PM Richard Biener wrote: > > On Mon, 31 Mar 2025, Jakub Jelinek wrote: > > > On Mon, Mar 31, 2025 at 03:33:34PM +0200, Richard Biener wrote: > > > On Mon, 31 Mar 2025, Jakub Jelinek wrote: > > > > > > > On Mon, Mar 31, 2025 at 03:12:56PM +0200, Richard Biener wrote: >

Re: [PATCH] APX: add nf counterparts for rotl split pattern [PR 119539]

2025-04-02 Thread Hongtao Liu
ngtao 于2025年4月2日周三 08:57写道: > > > > > > > > > -Original Message- > > > From: Uros Bizjak > > > Sent: Tuesday, April 1, 2025 5:24 PM > > > To: Hongtao Liu > > > Cc: Wang, Hongyu ; gcc-patches@gcc.gnu.org; Liu, > > > Hongtao > > &g

Re: [PATCH] APX: add nf counterparts for rotl split pattern [PR 119539]

2025-04-01 Thread Hongtao Liu
On Tue, Apr 1, 2025 at 4:40 PM Hongyu Wang wrote: > > Hi, > > For spiltter after 3_mask it now splits the pattern > to *3_mask, causing the splitter doesn't generate > nf variant. Add corresponding nf counterpart for define_insn_and_split > to make the splitter also works for nf insn. > > Bootstra

Re: [PATCH] target/119549 - fixup handling of -mno-sse4

2025-04-01 Thread Hongtao Liu
On Tue, Apr 1, 2025 at 3:56 PM Jakub Jelinek wrote: > > On Tue, Apr 01, 2025 at 01:36:23PM +0800, Hongtao Liu wrote: > > >Changing ix86_valid_target_attribute_inner_p might be even better because > > >OPT_msse4 is RejectNegative option, so !value for it looks weird.

Re: [PATCH] i386: Add attr_isa for vaes patterns to sync with attr gpr16. [pr119473]

2025-03-30 Thread Hongtao Liu
On Fri, Mar 28, 2025 at 1:55 PM Hu, Lin1 wrote: > > For vaes patterns with jm constraint and gpr16 attr, it requires "isa" > attr to distinct avx/avx512 alternatives in ix86_memory_address_reg_class. > Also adds missing type and mode attributes for those vaes patterns. Ok. > > gcc/ChangeLog: > >

Re: [PATCH] i386: Add PTA_AVX10_1_256 to PTA_DIAMONDRAPIDS

2025-03-30 Thread Hongtao Liu
On Fri, Mar 28, 2025 at 4:22 PM Haochen Jiang wrote: > > Hi all, > > For -march= handling, PTA_AVX10_1 will not imply PTA_AVX10_1_256, > resulting in TARGET_AVX10_1 becoming true while TARGET_AVX10_1_256 > false. Since we will check TARGET_AVX10_1_256 in GCC 15 for AVX512 > feature enabling for AV

Re: [PATCH] i386: Set attr "addr" as "gpr16" for constraint "jm". [PR 119425]

2025-03-26 Thread Hongtao Liu
On Wed, Mar 26, 2025 at 9:50 AM Hu, Lin1 wrote: > > Hi, all > > This patch aims to ensure each alternative with constraint "jm" should > set addr "gpr16", otherwise maybe raise ICE in reload pass. > > Bootstrapped and Regtested for x86_64-pc-linux-gnu{-m32,-m64}, ok for trunk? Ok. > > BRs, > Lin >

Re: [PATCH] i386: Fix AVX10.2 SAT CVT testcases.

2025-03-20 Thread Hongtao Liu
On Thu, Mar 20, 2025 at 3:14 PM Hu, Lin1 wrote: > > Hi, > > res_ref will be modified after MASK_ZERO, init res_ref2 for rounding > control intrinsics. > > Bootstrapped and regtested on x86-64-pc-linux-gnu{-m32,-m64}, OK for trunk? Ok. > > BRs, > Lin > > gcc/testsuite/ChangeLog: > > * gcc.t

Re: [PATCH] i386: Remove XFAIL for pr103750 testcases

2025-03-18 Thread Hongtao Liu
On Tue, Mar 11, 2025 at 2:29 PM Haochen Jiang wrote: > > Hi all, > > After commit r15-4510, the following testcases also do not need XFAIL. > > Ok for trunk? Ok. > > Thx, > Haochen > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/avx512f-pr103750-1.c: Remove XFAIL. > * gcc.target

Re: [PATCH] i386: Correct mask width for bf8->fp16 intrin on 256/512 bit

2025-03-05 Thread Hongtao Liu
On Wed, Mar 5, 2025 at 3:23 PM Haochen Jiang wrote: > > Hi all, > > For bf8 -> pf16 convert, when dst is 256 bit, the mask should be > 16 bit since 16*16=256, not the 8 bit in the current intrin. In > 512 bit intrin, the mask bit is also halved. This patch will fix > both of them. > > Ok for trunk

Re: [RFA] ira: Add new hooks for callee-save vs spills [PR117477]

2025-03-04 Thread Hongtao Liu
On Tue, Mar 4, 2025 at 6:31 PM Richard Biener wrote: > > On Tue, Mar 4, 2025 at 11:18 AM Richard Sandiford > wrote: > > > > Richard Sandiford writes: > > > Jan Hubicka writes: > > >>> > > >>> Thanks for running these. I saw poor results for perlbench with my > > >>> initial aarch64 hooks becau

Re: [PATCH 0/2] i386: Adjust AVX10 related options

2025-02-27 Thread Hongtao Liu
On Mon, Feb 17, 2025 at 9:51 AM Hongtao Liu wrote: > > On Thu, Feb 13, 2025 at 4:08 PM Haochen Jiang wrote: > > > > Hi all, > > > > According to the previous feedback on our RFC for AVX10 option adjustment > > and discussion with LLVM, we finalized how we a

Re: [PATCH] x86: Move TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P to i386.cc

2025-02-27 Thread Hongtao Liu
On Wed, Feb 26, 2025 at 6:01 AM H.J. Lu wrote: > > Move the TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P target hook from > i386.h to i386.cc. Ok for the patch, looks obvious. > > * config/i386/i386.h (TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P): > Moved to ... > * config/i386/i386.cc (TARGET_SMALL_REGI

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-19 Thread Hongtao Liu
On Wed, Feb 19, 2025 at 9:06 PM Jan Hubicka wrote: > > Hi, > this is a variant of a hook I benchmarked on cpu2016 with -Ofast -flto > and -O2 -flto. For non -Os and no Windows ABI should be pratically the > same as your variant that was simply returning mem_cost - 2. > I've tested O2/(Ofast march

Re: [PATCH 0/2] i386: Adjust AVX10 related options

2025-02-16 Thread Hongtao Liu
On Thu, Feb 13, 2025 at 4:08 PM Haochen Jiang wrote: > > Hi all, > > According to the previous feedback on our RFC for AVX10 option adjustment > and discussion with LLVM, we finalized how we are going to handle that. > > The overall direction is to re-alias avx10.x alias to 512 bit and only > usin

Re: [PATCH] i386: Do not check vector size conflict when AVX512 is not explicitly set [PR 118815]

2025-02-16 Thread Hongtao Liu
On Fri, Feb 14, 2025 at 9:56 AM Haochen Jiang wrote: > > Hi all, > > When AVX512 is not explicitly set, we should not take EVEX512 bit into > consideration when checking vector size. It will solve the intrin header > file reporting warnings when compiling with -Wsystem-headers. > > However, there

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-11 Thread Hongtao Liu
On Tue, Feb 11, 2025 at 4:27 PM H.J. Lu wrote: > > On Tue, Feb 11, 2025 at 4:13 PM Hongtao Liu wrote: > > > > > PR117081 is about regression in povray. The reducted testcase: > > Just for clarification. PR117081 is not about regression in povray. > > it's re

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-11 Thread Hongtao Liu
> PR117081 is about regression in povray. The reducted testcase: Just for clarification. PR117081 is not about regression in povray. it's related to FAIL: gcc.target/i386/pr91384.c scan-assembler-not testl The pr91384.c is added by r12-7417 which is peephole optimization expecting some specific ins

Re: [PATCH 0/3] GCC13/GCC12 backport [PR108707][PR109610]

2025-02-09 Thread Hongtao Liu
On Mon, Feb 10, 2025 at 1:43 PM liuhongt wrote: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108707#c9 > > >Pranav Gorantla 2025-02-06 04:30:05 UTC > >Facing similar issue in gcc-13. Is it possible to backport the fix of this > >Bug 108707 and Bug 109610 to gcc-13, gcc-12 as well. > > This se

Re: [PATCH] x86: Verify that PUSH/POP can be skipped

2025-02-07 Thread Hongtao Liu
On Fri, Feb 7, 2025 at 1:57 PM H.J. Lu wrote: > > For > > --- > int f(int); > > int advance(int dz) > { > if (dz > 0) > return (dz + dz) * dz; > else > return dz * f(dz); > } > --- > > Before r15-1619-g3b9b8d6cfdf593 > > advance(int): > pushrbx > mov

Re: [PATCH] i386: Append -march=x86-64-v3 to AVX10.2/512 VNNI testcases

2025-01-22 Thread Hongtao Liu
On Wed, Jan 22, 2025 at 11:13 AM Haochen Jiang wrote: > > Hi all, > > These two testcases are misses on previous addition for > -march=x86-64-v3 to silence warning for -march=native tests. > > Ok for trunk? Ok. > > Thx, > Haochen > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/vnniint16

Re: [PATCH 00/13] Realign x86 GCC after Binutils change [PR118270]

2025-01-21 Thread Hongtao Liu
On Tue, Jan 21, 2025 at 4:42 PM Haochen Jiang wrote: > > Hi all, > > Recently, DMR ISAs got lots of changes in mnemonics. The detailed change > are: > > - NE would be removed for all AVX10.2 new insns > - VCOMSBF16 -> VCOMISBF16 > - P for packed omitted for AI data types (BF16, TF32, FP8) >

Re: [RFA for x86] Don't include subst attributes in "@" md helpers

2024-12-23 Thread Hongtao Liu
On Thu, Dec 19, 2024 at 12:01 AM Richard Sandiford wrote: > > In a later patch, I need to add "@" to a pattern that uses subst > iterators. This combination is problematic for two reasons: > > (1) define_substs are applied and filtered at a later stage than the > handling of "@" patterns, so

Re: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2024-12-01 Thread Hongtao Liu
On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote: > > For all different modes of all 0s/1s vectors, we can use the single widest > all 0s/1s vector register for all 0s/1s vector uses in the whole function. > Add a pass to generate a single widest all 0s/1s vector set instruction at > entry of the near

Re: [PATCH] [x86] [RFC] Prevent loop vectorization if it's in a deeply nested big loop.

2024-11-28 Thread Hongtao Liu
On Thu, Nov 28, 2024 at 4:57 PM Richard Biener wrote: > > On Thu, Nov 28, 2024 at 3:04 AM Hongtao Liu wrote: > > > > On Wed, Nov 27, 2024 at 9:43 PM Richard Biener > > wrote: > > > > > > On Wed, Nov 27, 2024 at 4:26 AM liuhongt wrote: > > >

Re: [PATCH] __builtin_prefetch fixes [PR117608]

2024-11-27 Thread Hongtao Liu
On Wed, Nov 27, 2024 at 8:50 PM Richard Biener wrote: > > On Wed, 27 Nov 2024, Jakub Jelinek wrote: > > > Hi! > > > > The r15-4833-ge9ab41b79933 patch had among tons of config/i386 > > specific changes also important change to the generic code, allowing > > also 2 as valid value of the second argu

Re: [PATCH] [x86] [RFC] Prevent loop vectorization if it's in a deeply nested big loop.

2024-11-27 Thread Hongtao Liu
On Wed, Nov 27, 2024 at 9:43 PM Richard Biener wrote: > > On Wed, Nov 27, 2024 at 4:26 AM liuhongt wrote: > > > > When loop requires any kind of versioning which could increase register > > pressure too much, and it's in a deeply nest big loop, don't do > > vectorization. > > > > I tested the pat

Re: Patch ping - [PATCH] [APX EGPR] Fix indirect call prefix

2024-11-24 Thread Hongtao Liu
On Mon, Nov 25, 2024 at 2:32 PM Kong, Lingling wrote: > > Hi, > > LGTM. > Now Hongyu and Hongtao are working on APX. Ok. > > Thanks, > Lingling > > > -Original Message- > > From: Gregory Kanter > > Sent: Saturday, November 23, 2024 8:16 AM > > To: gcc-patches@gcc.gnu.org > > Cc: Kong, Lin

Re: [PATCH] i386/testsuite: Correct AVX10.2 FP8 test mask usage

2024-11-24 Thread Hongtao Liu
On Fri, Nov 22, 2024 at 4:08 PM Haochen Jiang wrote: > > Hi all, > > Under FP8, we should not use AVX512F_LEN_HALF to get the mask size since > it will get 16 instead of 8 and drop into wrong if condition. Correct > the usage for vcvtneph2[b,h]f8[,s] runtime test. > > Tested under sde. Ok for trun

Re: [PATCH] Optimize 128-bit vector permutation with pand, pandn and por.

2024-11-24 Thread Hongtao Liu
On Wed, Nov 20, 2024 at 8:03 PM Cui, Lili wrote: > > Hi, all > > This patch aims to handle certain vector shuffle operations using pand, pandn > and por more efficiently. > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? Although it's stage 3, I think this one is low risk, so O

Re: [PATCH] [RFC] Add extra 64bit SSE vector epilogue in some cases

2024-11-24 Thread Hongtao Liu
On Sun, Nov 24, 2024 at 8:05 PM Richard Biener wrote: > > > > > Am 24.11.2024 um 09:17 schrieb Hongtao Liu : > > > > On Fri, Nov 22, 2024 at 9:33 PM Richard Biener wrote: > >> > >> Similar to the X86_TUNE_AVX512_TWO_EPILOGUES tuning which enables

Re: [PATCH] [x86] Fix uninitialized operands[2] in vec_unpacks_hi_v4sf.

2024-11-24 Thread Hongtao Liu
On Fri, Nov 22, 2024 at 9:16 PM Richard Biener wrote: > > On Fri, 22 Nov 2024, liuhongt wrote: > > > It could cause weired spill in RA when register pressure is high. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ok for trunk? > > > > BTW, It's difficult to get a decent tes

Re: [PATCH] [RFC] Add extra 64bit SSE vector epilogue in some cases

2024-11-24 Thread Hongtao Liu
On Fri, Nov 22, 2024 at 9:33 PM Richard Biener wrote: > > Similar to the X86_TUNE_AVX512_TWO_EPILOGUES tuning which enables > an extra 128bit SSE vector epilouge when doing 512bit AVX512 > vectorization in the main loop the following allows a 64bit SSE > vector epilogue to be generated when the pr

Re: [PATCH] i386/testsuite: Do not append AVX10.2 option for check_effective_target

2024-11-21 Thread Hongtao Liu
On Fri, Nov 22, 2024 at 2:40 PM Haochen Jiang wrote: > > Hi all, > > When -avx10.2 meet -march with AVX512 enabled, it will report warning > for vector size conflict. The warning will prevent the test to run on > GCC with arch native build on those platforms when > check_effective_target. > > Remo

Re: [PATCH] i386/testsuite: Enhance AVX10.2 vmovd/w testcases

2024-11-20 Thread Hongtao Liu
On Thu, Nov 21, 2024 at 2:40 PM Haochen Jiang wrote: > > Hi all, > > Under -fno-omit-frame-pointer, %ebp will be used, which is the > Solaris/x86 default. Both check %ebp and %esp to avoid error on that. > > Tested under -m32 w/ and w/o -fno-omit-frame-pointer. Ok for trunk? Ok. > > Thx, > Haochen

Re: [PATCH] i386: Fix cstorebf4 fp comparison operand [PR117495]

2024-11-13 Thread Hongtao Liu
On Wed, Nov 13, 2024 at 10:00 AM Hongyu Wang wrote: > > Hi, > > For cstorebf4 it uses comparison_operator for BFmode compare, which is > incorrect when directly uses ix86_expand_setcc as it does not canonicalize > the input comparison to correct the compare code by swapping operands. > Since the o

Re: [PATCH v2 ] i386: Add ix86_expand_integer_cst_argument

2024-11-12 Thread Hongtao Liu
On Wed, Nov 13, 2024 at 8:29 AM H.J. Lu wrote: > > On Wed, Nov 13, 2024 at 5:57 AM H.J. Lu wrote: > > > > On Tue, Nov 12, 2024 at 9:30 PM Richard Biener > > wrote: > > > > > > On Tue, Nov 12, 2024 at 1:49 PM H.J. Lu wrote: > > > > > > > > When passing 0xff as an unsigned char function argument,

Re: [PATCH 2/2] Add X86_TUNE_AVX512_TWO_EPILOGUES, enable for Zen4 and Zen5

2024-11-11 Thread Hongtao Liu
On Mon, Nov 11, 2024 at 8:20 PM Richard Biener wrote: > > The following adds X86_TUNE_AVX512_TWO_EPILOGUES tuning and directs the > vectorizer to produce both a vector AVX2 and SSE epilogue for AVX512 > vectorized loops when set. The tuning is enabled by default for Zen4 > and Zen5 where I benchm

Re: [PATCH] Guard truncate from vector float to vector __bf16 with !flag_rounding_math && HONOR_NANS (BFmode).

2024-11-10 Thread Hongtao Liu
On Fri, Nov 8, 2024 at 10:33 AM liuhongt wrote: > > hw instruction doesn't raise exceptions, turns sNAN into qNAN quietly, > and always round to nearest (even). Output denormals are always > flushed to zero and input denormals are always treated as zero. MXCSR > is not consulted nor updated. > W/o

  1   2   3   4   5   6   7   8   9   10   >