Re: [PATCH] i386: Decouple AMX-AVX512 from AVX10.2 and imply AVX512F

2025-07-15 Thread Hongtao Liu
On Tue, Jul 15, 2025 at 2:36 PM Haochen Jiang wrote: > > Hi all, > > In ISE058, the AVX10.2 imply is removed from AMX-AVX512. This > leads to re-consideration on the imply for AMX-AVX512. > > Since it is using zmm register and using zmm register only, we > need to at least imply AVX512F. AVX512VL

RE: [PATCH] i386: Remove KEYLOCKER related feature since Panther Lake and Clearwater Forest

2025-07-13 Thread Liu, Hongtao
> -Original Message- > From: Jiang, Haochen > Sent: Monday, July 14, 2025 10:59 AM > To: gcc-patches@gcc.gnu.org > Cc: Liu, Hongtao ; ubiz...@gmail.com > Subject: [PATCH] i386: Remove KEYLOCKER related feature since Panther Lake > and Clearwater Forest > >

RE: [PATCH] i386: Add a new peeophole2 for PR91384 under APX_F

2025-07-11 Thread Liu, Hongtao
> -Original Message- > From: Hu, Lin1 > Sent: Wednesday, June 4, 2025 3:26 PM > To: gcc-patches@gcc.gnu.org > Cc: Liu, Hongtao ; ubiz...@gmail.com > Subject: [PATCH] i386: Add a new peeophole2 for PR91384 under APX_F > > gcc/ChangeLog: > > PR targ

Re: [PATCH v3] x86: Improve vector_loop/unrolled_loop for memset/memcpy

2025-07-07 Thread Hongtao Liu
On Mon, Jul 7, 2025 at 3:27 PM Hongtao Liu wrote: > > On Tue, Jun 24, 2025 at 2:11 PM H.J. Lu wrote: > > > > On Mon, Jun 23, 2025 at 2:24 PM H.J. Lu wrote: > > > > > > On Wed, Jun 18, 2025 at 3:17 PM H.J. Lu wrote: > > > > > > > >

Re: [PATCH v3] x86: Improve vector_loop/unrolled_loop for memset/memcpy

2025-07-07 Thread Hongtao Liu
On Tue, Jun 24, 2025 at 2:11 PM H.J. Lu wrote: > > On Mon, Jun 23, 2025 at 2:24 PM H.J. Lu wrote: > > > > On Wed, Jun 18, 2025 at 3:17 PM H.J. Lu wrote: > > > > > > 1. Don't generate the loop if the loop count is 1. > > > 2. For memset with vector on small size, use vector if small size supports

Re: [PATCH 2/2] add masked-epilogue tuning

2025-07-07 Thread Hongtao Liu
On Mon, Jul 7, 2025 at 3:18 PM Hongtao Liu wrote: > > On Fri, Jul 4, 2025 at 5:45 PM Richard Biener wrote: > > > > The following adds a x86 tuning to enable the use of AVX512 masked > > epilogues in cases we heuristically determine it to be not detrimental > &

Re: [PATCH 2/2] add masked-epilogue tuning

2025-07-07 Thread Hongtao Liu
On Fri, Jul 4, 2025 at 5:45 PM Richard Biener wrote: > > The following adds a x86 tuning to enable the use of AVX512 masked > epilogues in cases we heuristically determine it to be not detrimental > by high chance. Basically problematic cases are when there are > data streams that are both stored

RE: [PATCH] i386: Change Diamond Rapids feature detect when model number could not be distinguished

2025-07-01 Thread Liu, Hongtao
> -Original Message- > From: Jiang, Haochen > Sent: Wednesday, July 2, 2025 11:10 AM > To: gcc-patches@gcc.gnu.org > Cc: Liu, Hongtao ; ubiz...@gmail.com > Subject: [PATCH] i386: Change Diamond Rapids feature detect when model > number could not be distinguished

Re: [PATCH v2] x86: Preserve frame pointer for no_callee_saved_registers attribute

2025-06-29 Thread Hongtao Liu
On Mon, Jun 30, 2025 at 11:46 AM H.J. Lu wrote: > > On Mon, Jun 30, 2025 at 11:17 AM H.J. Lu wrote: > > > > On Mon, Jun 30, 2025 at 10:41 AM Hongtao Liu wrote: > > > > > > On Mon, Jun 30, 2025 at 10:37 AM Hongtao Liu wrote: > > > > > &

Re: [PATCH] x86: Preserve frame pointer for no_callee_saved_registers attribute

2025-06-29 Thread Hongtao Liu
On Mon, Jun 30, 2025 at 11:16 AM H.J. Lu wrote: > > On Mon, Jun 30, 2025 at 10:37 AM Hongtao Liu wrote: > > > > On Sat, Jun 28, 2025 at 8:30 PM H.J. Lu wrote: > > > > > > Update functions with no_callee_saved_registers/preserve_none attribute > > > t

Re: [PATCH] x86: Preserve frame pointer for no_callee_saved_registers attribute

2025-06-29 Thread Hongtao Liu
On Sat, Jun 28, 2025 at 8:30 PM H.J. Lu wrote: > > Update functions with no_callee_saved_registers/preserve_none attribute > to preserve frame pointer since caller may use it to save the current > stack: > > pushq %rbp > movq %rsp, %rbp > ... > call function > ... > leave > ret > > If callee chang

Re: [PATCH] x86: Preserve frame pointer for no_callee_saved_registers attribute

2025-06-29 Thread Hongtao Liu
On Mon, Jun 30, 2025 at 10:37 AM Hongtao Liu wrote: > > On Sat, Jun 28, 2025 at 8:30 PM H.J. Lu wrote: > > > > Update functions with no_callee_saved_registers/preserve_none attribute > > to preserve frame pointer since caller may use it to save the current > > stac

Re: [PATCH] x86: Handle vector broadcast source

2025-06-26 Thread Hongtao Liu
On Thu, Jun 26, 2025 at 2:17 PM H.J. Lu wrote: > > On Thu, Jun 26, 2025 at 2:11 PM Hongtao Liu wrote: > > > > On Thu, Jun 26, 2025 at 1:59 PM H.J. Lu wrote: > > > > > > Use the inner scalar mode of vector broadcast source in: > > > > > >

Re: [PATCH] x86: Also handle all 1s float vector constant

2025-06-25 Thread Hongtao Liu
On Thu, Jun 26, 2025 at 2:02 PM H.J. Lu wrote: > > Since float vector constant > > (const_vector:V4SF [(const_double:SF -QNaN [-QNaN]) repeated x4]) > > is an all 1s float vector constant, update the remove_redundant_vector > pass to replace > > (insn 20 18 21 2 (set (reg:V4SF 124) > (cons

Re: [PATCH] x86: Handle vector broadcast source

2025-06-25 Thread Hongtao Liu
On Thu, Jun 26, 2025 at 1:59 PM H.J. Lu wrote: > > Use the inner scalar mode of vector broadcast source in: > > (set (reg:V8DF 394) >(vec_duplicate:V8DF (reg:V2DF 190 [ alpha ]))) > > to compute the vector mode for broadcast from vector source. ix86_get_vector_cse_mode (unsigned int si

Re: [PATCH] x86: Handle REG_EH_REGION note in DEF_INSN

2025-06-25 Thread Hongtao Liu
On Thu, Jun 26, 2025 at 1:56 PM H.J. Lu wrote: > > On Thu, Jun 26, 2025 at 1:24 PM Hongtao Liu wrote: > > > > On Thu, Jun 26, 2025 at 6:20 AM H.J. Lu wrote: > > > > > > For tcpsock_test.go in libgo tests, > > > > > > commit aba3b9d3

Re: [PATCH] x86: Handle REG_EH_REGION note in DEF_INSN

2025-06-25 Thread Hongtao Liu
On Thu, Jun 26, 2025 at 6:20 AM H.J. Lu wrote: > > For tcpsock_test.go in libgo tests, > > commit aba3b9d3a48a0703fd565f7c5f0caf604f59970b > Author: H.J. Lu > Date: Fri May 9 07:17:07 2025 +0800 > > x86: Extend the remove_redundant_vector pass > > added an instruction: > > (insn 501 101 102

Re: [PATCH v3] x86: Add preserve_none and update no_caller_saved_registers attributes

2025-06-25 Thread Hongtao Liu
On Wed, Jun 25, 2025 at 3:35 PM H.J. Lu wrote: > > Add preserve_none attribute which is similar to no_callee_saved_registers > attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are > used for integer parameter passing. This can be used in an interpreter > to avoid saving/rest

Re: [PATCH] x86: Add debug dump for the remove_redundant_vector pass

2025-06-25 Thread Hongtao Liu
On Thu, Jun 26, 2025 at 6:21 AM H.J. Lu wrote: > > On Tue, Jun 24, 2025 at 2:21 PM H.J. Lu wrote: > > > > Add debug dump for the remove_redundant_vector pass with the following > > output: > > > > Replace: > > > > (insn 7 4 8 2 (set (reg:V2DI 103) > > (const_vector:V2DI [ > >

Re: [PATCH v3] x86: Update memcpy/memset inline strategies for -mtune=generic

2025-06-25 Thread Hongtao Liu
On Tue, Jun 17, 2025 at 8:54 PM Cui, Lili wrote: > > > > > -Original Message- > > From: H.J. Lu > > Sent: Monday, June 16, 2025 10:08 PM > > To: Jan Hubicka > > Cc: Uros Bizjak ; Cui, Lili ; gcc- > > patc...@gcc.gnu.org; Liu, Hongtao ; >

Re: [PATCH v2] x86: Add preserve_none and update no_caller_saved_registers attributes

2025-06-24 Thread Hongtao Liu
On Fri, May 23, 2025 at 1:56 PM H.J. Lu wrote: > > Add preserve_none attribute which is similar to no_callee_saved_registers > attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are > used for integer parameter passing. This can be used in an interpreter > to avoid saving/rest

Re: [PATCH] x86: Update -mtune=intel for Diamond Rapids/Clearwater Forest

2025-06-24 Thread Hongtao Liu
On Wed, Jun 25, 2025 at 1:06 PM H.J. Lu wrote: > > -mtune=intel is used to generate a single binary to run well on both big > core and small core, similar to hybrid CPUs. Update -mtune=intel to tune > for Diamond Rapids and Clearwater Forest, instead of Silvermont. > > PR target/120815 > * common

Re: [PATCH v4] x86: Extend the remove_redundant_vector pass

2025-06-24 Thread Hongtao Liu
On Tue, Jun 24, 2025 at 1:26 PM H.J. Lu wrote: > > On Mon, Jun 23, 2025 at 4:53 PM Hongtao Liu wrote: > > > > On Mon, Jun 23, 2025 at 4:45 PM H.J. Lu wrote: > > > > > > On Mon, Jun 23, 2025 at 4:10 PM H.J. Lu wrote: > > > > > >

Re: [PATCH v3] x86: Extend the remove_redundant_vector pass

2025-06-23 Thread Hongtao Liu
On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu wrote: > > Extend the remove_redundant_vector pass to handle vector broadcasts from > constant and variable scalars. When broadcasting from constants and > function arguments, we can place a single widest vector broadcast at > entry of the nearest common d

Re: [PATCH v4] x86: Extend the remove_redundant_vector pass

2025-06-23 Thread Hongtao Liu
On Mon, Jun 23, 2025 at 4:45 PM H.J. Lu wrote: > > On Mon, Jun 23, 2025 at 4:10 PM H.J. Lu wrote: > > > > On Mon, Jun 23, 2025 at 3:11 PM Hongtao Liu wrote: > > > > > > On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu wrote: > > > > > > > &

Re: [PATCH v3] x86: Extend the remove_redundant_vector pass

2025-06-23 Thread Hongtao Liu
On Mon, Jun 23, 2025 at 4:10 PM H.J. Lu wrote: > > On Mon, Jun 23, 2025 at 3:11 PM Hongtao Liu wrote: > > > > On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu wrote: > > > > > > Extend the remove_redundant_vector pass to handle vector broadcasts from > &

Re: [PATCH v3] x86: Extend the remove_redundant_vector pass

2025-06-23 Thread Hongtao Liu
On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu wrote: > > Extend the remove_redundant_vector pass to handle vector broadcasts from > constant and variable scalars. When broadcasting from constants and > function arguments, we can place a single widest vector broadcast at > entry of the nearest common d

Re: [PATCH v2] x86: Don't use vmovdqu16/vmovdqu8 with non-EVEX registers

2025-06-22 Thread Hongtao Liu
On Sat, Jun 21, 2025 at 11:09 PM H.J. Lu wrote: > > On Fri, Jun 20, 2025 at 4:12 PM H.J. Lu wrote: > > > > Don't use vmovdqu16/vmovdqu8 with non-EVEX registers even if AVX512BW is > > available. > > > > gcc/ > > > > PR target/120728 > > * config/i386/i386.cc (ix86_get_ssemov): Use vmovdqu16/vmovd

Re: [PATCH] x86: Add PROCESSOR_XXX comments to processor_cost_table

2025-06-22 Thread Hongtao Liu
On Mon, Jun 23, 2025 at 11:03 AM H.J. Lu wrote: > > Add a PROCESSOR_XXX comment to each entry in processor_cost_table to > describe which processor the cost enry is applied to. Ok as obvious. > > * config/i386/i386-options.cc (processor_cost_table): Add a > PROCESSOR_XXX comment to each entry. > >

Re: [PATCH] i386: Remove CLDEMOTE for clients

2025-06-22 Thread Hongtao Liu
On Fri, Jun 20, 2025 at 10:04 AM Haochen Jiang wrote: > > Hi all, > > CLDEMOTE is not enabled on clients according to SDM. SDM only mentioned > it will be enabled on Xeon and Atom servers, not clients. Remove them > since Alder Lake (where it is introduced). > > Also will backport this patch to GC

Re: [PATCH v4] x86: Enable *mov_(and|or) only for -Oz

2025-06-19 Thread Hongtao Liu
On Wed, Jun 18, 2025 at 6:38 PM H.J. Lu wrote: > > commit ef26c151c14a87177d46fd3d725e7f82e040e89f > Author: Roger Sayle > Date: Thu Dec 23 12:33:07 2021 + > > x86: PR target/103773: Fix wrong-code with -Oz from pop to memory. > > added "*mov_and" and extended "*mov_or" to transform > "

Re: [PATCH v2 x86: Extend the remove_redundant_vector pass

2025-06-17 Thread Hongtao Liu
On Wed, Jun 18, 2025 at 2:39 PM H.J. Lu wrote: > > On Mon, Jun 16, 2025 at 4:14 PM Hongtao Liu wrote: > > > > >+enum redundant_load_kind > > >+{ > > >+ LOAD_CONST0_VECTOR, > > >+ LOAD_CONSTM1_VECTOR, > > >+ LOAD_VECTOR > >

Re: [PATCH v3] x86: Enable *mov_(and|or) only for -Oz

2025-06-17 Thread Hongtao Liu
On Mon, May 26, 2025 at 2:30 PM H.J. Lu wrote: > > On Sun, May 25, 2025 at 7:02 PM H.J. Lu wrote: > > > > On Sun, May 25, 2025 at 8:12 AM H.J. Lu wrote: > > > > > > On Sun, May 25, 2025 at 7:47 AM H.J. Lu wrote: > > > > > > > > commit ef26c151c14a87177d46fd3d725e7f82e040e89f > > > > Author: Rog

Re: [PATCH] [AUTOFDO] Don't scale bb_count with ipa_count when ipa_count is zero but count_max is not

2025-06-16 Thread Hongtao Liu
Drop this patch since https://gcc.gnu.org/pipermail/gcc-patches/2025-June/686830.html could be a better alternative. On Tue, Jun 10, 2025 at 9:50 AM Hongtao Liu wrote: > > Ping > > On Mon, May 19, 2025 at 10:06 AM liuhongt wrote: > > > > From: "hongtao.liu" &

Re: [PATCH v2 x86: Extend the remove_redundant_vector pass

2025-06-16 Thread Hongtao Liu
On Mon, Jun 16, 2025 at 4:30 PM Hongtao Liu wrote: > > >+enum redundant_load_kind > >+{ > >+ LOAD_CONST0_VECTOR, > >+ LOAD_CONSTM1_VECTOR, > >+ LOAD_VECTOR > >+}; > Perhaps rename to x86_cse_kind, X86_CSE_CONST0_VECTOR, > X86_CSE_CONSTM1_VECTOR, X

Re: [PATCH v2 x86: Extend the remove_redundant_vector pass

2025-06-16 Thread Hongtao Liu
>+enum redundant_load_kind >+{ >+ LOAD_CONST0_VECTOR, >+ LOAD_CONSTM1_VECTOR, >+ LOAD_VECTOR >+}; Perhaps rename to x86_cse_kind, X86_CSE_CONST0_VECTOR, X86_CSE_CONSTM1_VECTOR, X86_CSE_VEC_DUP? LOAD sounds a bit ambiguous. Similar to ix86_get_vector_load_mode -> ix86_get_vector_cse_mode? >+

Re: [PATCH] i386: Set SRF, GRR, CWF, GNR, DMR, ARL and PTL issue rate

2025-06-12 Thread Hongtao Liu
On Thu, Jun 12, 2025 at 10:51 AM Hu, Lin1 wrote: > > Hi, > > This patch aims to set SRF issue rate to 4, GNR issue rate to 6. According to > tests about spec2017, the patch has little effect on performance. > > For GRR, CWF, DMR, ARL and PTL, the patch set their issue rate to 6. Waiting > for > m

Re: [PATCH] [AUTOFDO] Don't scale bb_count with ipa_count when ipa_count is zero but count_max is not

2025-06-09 Thread Hongtao Liu
Ping On Mon, May 19, 2025 at 10:06 AM liuhongt wrote: > > From: "hongtao.liu" > > AutoFDO profile is a scaled profile, as a result, 0 sample does not > mean never executed. especially there's profile from function > body. Prevent combine_with_ipa_count·(ipa_count) from zeroing all > bb->count. >

Re: [PATCH] x86: Extend the remove_redundant_vector pass

2025-06-09 Thread Hongtao Liu
On Tue, Jun 3, 2025 at 2:59 PM H.J. Lu wrote: > > Extend the remove_redundant_vector pass to handle vector broadcasts from > constant and variable scalars. When broadcasting from constants and > function arguments, we can place a single widest vector broadcast at > entry of the nearest common dom

Re: [PATCH] i386: Implement Thread Local Storage on Windows

2025-06-04 Thread LIU Hao
>= 0 always yields true (it's unsigned on Windows) -- Best regards, LIU Hao OpenPGP_signature.asc Description: OpenPGP digital signature

Re: [PATCH] i386: Add more peephole2 for APX NDD

2025-06-03 Thread Hongtao Liu
On Thu, May 29, 2025 at 4:56 PM Hu, Lin1 wrote: > > Hi, > > The patch aims to optimize > movb(%rdi), %al > movq%rdi, %rbx > xorl%esi, %eax, %edx > movb%dl, (%rdi) > cmpb%sil, %al > jne > to > xorb%sil, (%rdi) >

Re: [PATCH] i386: Add more forms peephole2 for adc/sbb

2025-06-03 Thread Hongtao Liu
On Mon, May 26, 2025 at 4:55 PM Hu, Lin1 wrote: > > Hi, all > > Enable -mapxf will change some patterns about adc/sbb. > > Hence gcc will raise an extra mov like > movq8(%rdi), %rax > adcq%rax, 8(%rsi), %rax > movq%rax, 8(%rdi) > rather than > movq

Re: [PATCH] Enable mcf thread model for aarch64-*-mingw*.

2025-05-25 Thread LIU Hao
在 2025-5-16 16:50, LIU Hao 写道: This is a leftover of d6d7afcdbc04adb0ec42a44b2d7e05600945af42. After this change, configuration files of all three thread models are in 'libgcc/config/mingw/'. The patch has been bootstrapped on {x86_64,i686}-w64-mingw32. ARM64 port is still working i

Re: [PATCH] i386: Quote user-defined symbols in assembly in Intel syntax

2025-05-19 Thread LIU Hao
在 2025-5-13 17:18, LIU Hao 写道: Hello, Attached is a patch for PR 53929, but is also required by PR 80881. Ping. Also I just notice that Clang also quotes mangled MSVC++ symbols in this way, at least since Clang 3.5, so it's accepted by both GAS and LLVM: (https://gcc.godbolt.

Re: [PATCH v2 0/7] Remove -mavx10.1-256/512 and -mno-evex512

2025-05-18 Thread Hongtao Liu
On Wed, May 14, 2025 at 3:29 PM Haochen Jiang wrote: > > Hi all, > > This is the v2 patch to remove -mavx10.1/256-512 and -mno-evex512. I suppose > this time all the patches will not be held due to size. > > As mentioned in GCC 15, we will remove -mavx10.1-256/512 and -mno-evex512 > options in GCC

[PATCH] Enable mcf thread model for aarch64-*-mingw*.

2025-05-16 Thread LIU Hao
NWIND_INFO in gcc/config/i386/cygming.h diff --git a/libgcc/config/i386/t-mingw-mcfgthread b/libgcc/config/mingw/t-mingw-mcfgthread similarity index 100% rename from libgcc/config/i386/t-mingw-mcfgthread rename to libgcc/config/mingw/t-mingw-mcfgthread -- 2.49.0 From b48e41b58158d6311906010954c987

Re: [PATCH] For datarefs with big gap, split them into different groups.

2025-05-15 Thread Hongtao Liu
It's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119181 On Fri, May 16, 2025 at 10:02 AM liuhongt wrote: > > The patch tries to solve miss vectorization for below case. > > void > foo (int* a, int* restrict b) > { > b[0] = a[0] * a[64]; > b[1] = a[65] * a[1]; > b[2] = a[2] * a[66]; >

Re: [PATCH] x86: Add preserve_none and update no_caller_saved_registers attributes

2025-05-13 Thread Hongtao Liu
On Fri, Apr 18, 2025 at 7:10 PM H.J. Lu wrote: > > Add preserve_none attribute which is similar to no_callee_saved_registers > attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are Could you split preserve_none into a separate patch, It looks like it's different from clang's p

Re: [PATCH] Update libbid according to the latest Intel Decimal Floating-Point Math Library.

2025-05-13 Thread Hongtao Liu
On Wed, May 14, 2025 at 9:22 AM liuhongt wrote: > > The Intel Decimal Floating-Point Math Library is available as open-source on > Netlib[1]. > > [1] https://www.netlib.org/misc/intel/ > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ready push to trunk. > > libgcc/config/libbid/Ch

[PATCH] i386: Quote user-defined symbols in assembly in Intel syntax

2025-05-13 Thread LIU Hao
syntax, as some Linux headers contain inline assembly with only AT&T templates. It is however possible to bootstrap GCC on {i686,x86_64}-w64-mingw32. -- Best regards, LIU Hao From d733676c742f9af9b9ab34317433db242128e53d Mon Sep 17 00:00:00 2001 From: LIU Hao Date: Sat, 22 Feb 2025 13

Re: [PATCH v3] Consider frequency in cost estimation when converting scalar to vector.

2025-05-11 Thread Hongtao Liu
On Thu, May 8, 2025 at 2:40 PM liuhongt wrote: > > The only part I changed is related to size_cost of sse_to_ineteger, as below > > 114+ /* Under TARGET_SSE4_1, it's vmovd + vpextrd/vpinsrd. > 115+ W/o it, it's movd + psrlq/unpckldq + movd. */ > 116+ else if (!TARGET_64BIT && smode != SImod

Re: [PATCH v3] i386/cygming: Decrease default preferred stack boundary for 32-bit targets

2025-05-10 Thread LIU Hao
在 2025-5-10 20:48, Jonathan Yong 写道: On 5/9/25 4:26 PM, LIU Hao wrote: 在 2025-5-3 20:52, LIU Hao 写道: 在 2025-5-2 01:25, LIU Hao 写道: Remove `STACK_REALIGN_DEFAULT` for this target, because now the default value of `incoming_stack_boundary` equals `MIN_STACK_BOUNDARY` and it doesn't ha

Re: [PATCH v3] i386/cygming: Decrease default preferred stack boundary for 32-bit targets

2025-05-09 Thread LIU Hao
在 2025-5-3 20:52, LIU Hao 写道: 在 2025-5-2 01:25, LIU Hao 写道: Remove `STACK_REALIGN_DEFAULT` for this target, because now the default value of `incoming_stack_boundary` equals `MIN_STACK_BOUNDARY` and it doesn't have an effect any more. I suddenly realized the previous patch was for G

Re: [PATCH v2] x86: Insert extra move for mode size smaller than natural size

2025-05-06 Thread Hongtao Liu
On Wed, May 7, 2025 at 9:06 AM H.J. Lu wrote: > > On Tue, May 6, 2025 at 3:35 PM Hongtao Liu wrote: > > > > On Tue, May 6, 2025 at 3:06 PM H.J. Lu wrote: > > > > > > On Tue, May 6, 2025 at 2:30 PM Liu, Hongtao wrote: > > > > > > > >

Re: [PATCH] x86: Skip if the mode size is smaller than its natural size

2025-05-06 Thread Hongtao Liu
On Tue, May 6, 2025 at 3:06 PM H.J. Lu wrote: > > On Tue, May 6, 2025 at 2:30 PM Liu, Hongtao wrote: > > > > > > > > > -Original Message- > > > From: H.J. Lu > > > Sent: Tuesday, May 6, 2025 2:16 PM > > > To: Liu, Hongtao

RE: [PATCH] x86: Skip if the mode size is smaller than its natural size

2025-05-05 Thread Liu, Hongtao
> -Original Message- > From: H.J. Lu > Sent: Tuesday, May 6, 2025 2:16 PM > To: Liu, Hongtao > Cc: GCC Patches ; Uros Bizjak > > Subject: Re: [PATCH] x86: Skip if the mode size is smaller than its natural > size > > On Tue, May 6, 2025 at

RE: [PATCH] x86: Skip if the mode size is smaller than its natural size

2025-05-05 Thread Liu, Hongtao
> -Original Message- > From: H.J. Lu > Sent: Thursday, May 1, 2025 6:39 AM > To: GCC Patches ; Uros Bizjak > ; Liu, Hongtao > Subject: [PATCH] x86: Skip if the mode size is smaller than its natural size > > When generating a SUBREG from V16QI to V2HF, validate_

Re: [PATCH] i386: Quote user-defined symbols in assembly in Intel syntax

2025-05-05 Thread LIU Hao
在 2025-4-28 14:43, LIU Hao 写道: Hello, I'm sending this patch again after GCC 15 has been released. This patch was sent in February and but there were no comments: https://patchwork.sourceware.org/project/gcc/patch/eca6660c-6578-4e39-8aa9-be9fdd013...@126.com/ Ping. -- Best regards

Re: [PATCH] Allow a PCH to be mapped to a different address

2025-05-05 Thread LIU Hao
在 2025-4-28 15:05, LIU Hao 写道: This is a response to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14940#c57 The patch was submitted to MSYS2 for testing in 2022-5. No issue reports have been received so far: * https://github.com/msys2/MINGW-packages/blob

Re: [PATCH v3] i386/cygming: Decrease default preferred stack boundary for 32-bit targets

2025-05-03 Thread LIU Hao
在 2025-5-2 01:25, LIU Hao 写道: Remove `STACK_REALIGN_DEFAULT` for this target, because now the default value of `incoming_stack_boundary` equals `MIN_STACK_BOUNDARY` and it doesn't have an effect any more. I suddenly realized the previous patch was for GCC 15 branch. Here's

Re: [PATCH] i386: Implement Thread Local Storage on Windows

2025-05-02 Thread LIU Hao
ly an ABI break for code that uses `__thread`, `_Thread_local` or `thread_local`. Other than that, this patch seems mostly fine. -- Best regards, LIU Hao OpenPGP_signature.asc Description: OpenPGP digital signature

[PATCH v3] i386/cygming: Decrease default preferred stack boundary for 32-bit targets

2025-05-01 Thread LIU Hao
Remove `STACK_REALIGN_DEFAULT` for this target, because now the default value of `incoming_stack_boundary` equals `MIN_STACK_BOUNDARY` and it doesn't have an effect any more. -- Best regards, LIU Hao From eeb30bf621baa3af1a73e8e91bff297ef478 Mon Sep 17 00:00:00 2001 From: LIU Hao

[PATCH v2] i386/cygming: Decrease default preferred stack boundary for 32-bit targets

2025-05-01 Thread LIU Hao
not always aligned to 16 bytes, but I don't have any system with such a configuration, so can't test that for now. -- Best regards, LIU Hao From 1c101f4903a9be7d56efa8d97be603284f6bd4d4 Mon Sep 17 00:00:00 2001 From: LIU Hao Date: Tue, 29 Apr 2025 10:43:06 +0800 Subject: [PATCH] i3

RE: Make ix86 cost of VEC_SELECT equivalent to SUBREG same as of SUBREG

2025-04-29 Thread Liu, Hongtao
> -Original Message- > From: Jan Hubicka > Sent: Wednesday, April 30, 2025 4:11 AM > To: gcc-patches@gcc.gnu.org; Liu, Hongtao ; > ro...@nextmovesoftware.com; ubiz...@gmail.com > Subject: Make ix86 cost of VEC_SELECT equivalent to SUBREG same as of > SUBREG

Re: [PATCH] i386/cygming: Decrease default preferred stack boundary for 32-bit targets

2025-04-29 Thread LIU Hao
在 2025-4-29 13:03, LIU Hao 写道: This fixes a long-standing issue that GCC used to assume 16-byte stack alignment on i686-w64-mingw32, which is not always the case for callbacks from system libraries. CC Zeb Figura This patch looks a bit risky. The overall effect of `__attribute__

RE: [PATCH v3] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-29 Thread Liu, Hongtao
> -Original Message- > From: H.J. Lu > Sent: Tuesday, April 29, 2025 2:59 PM > To: Hongtao Liu > Cc: GCC Patches ; Liu, Hongtao > ; Uros Bizjak > Subject: [PATCH v3] x86: Add a pass to remove redundant all 0s/1s vector > load > > On Tue, Apr 29, 2

RE: [PATCH] i386: Add ix86_expand_unsigned_small_int_cst_argument

2025-04-28 Thread Liu, Hongtao
> -Original Message- > From: H.J. Lu > Sent: Tuesday, April 29, 2025 1:58 PM > To: Hongtao Liu > Cc: GCC Patches ; Uros Bizjak > ; Liu, Hongtao > Subject: Re: [PATCH] i386: Add > ix86_expand_unsigned_small_int_cst_argument > > On Tue, Apr 29,

[PATCH] i386/cygming: Decrease default preferred stack boundary for 32-bit targets

2025-04-28 Thread LIU Hao
This fixes a long-standing issue that GCC used to assume 16-byte stack alignment on i686-w64-mingw32, which is not always the case for callbacks from system libraries. -- Best regards, LIU Hao From 1b92f8105dbece1694dd3ab398cfb5e3ce2c15d9 Mon Sep 17 00:00:00 2001 From: LIU Hao Date: Tue

Re: [PATCH] i386: Add ix86_expand_unsigned_small_int_cst_argument

2025-04-28 Thread Hongtao Liu
On Sun, Apr 27, 2025 at 10:58 AM H.J. Lu wrote: > > When passing 0xff as an unsigned char function argument with the C frontend > promotion, expand_normal used to get > > constant > 255> > > and returned the rtx value using the sign-extended representation: > > (const_int 255 [0xff]) > > But aft

Re: [PATCH v2] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-28 Thread Hongtao Liu
On Mon, Apr 28, 2025 at 5:07 PM H.J. Lu wrote: > > On Mon, Apr 28, 2025 at 4:26 PM H.J. Lu wrote: > > > > > > > This is what my patch does: > > > But it iterates through vector_insns, using a def-ref chain to find > > > those insns. I think we can just record those single_set with src as > > > co

[PATCH] Allow a PCH to be mapped to a different address

2025-04-28 Thread LIU Hao
-Allow-a-PCH-to-be-mapped-to-a-different-addr.patch -- Best regards, LIU Hao From 5239275bb4df0e79bc4b2af57d90c2d10ad44863 Mon Sep 17 00:00:00 2001 From: LIU Hao Date: Wed, 11 May 2022 22:42:53 +0800 Subject: [PATCH] Allow a PCH to be mapped to a different address First, try mapping the PCH

[PATCH] i386: Quote user-defined symbols in assembly in Intel syntax

2025-04-27 Thread LIU Hao
Hello, I'm sending this patch again after GCC 15 has been released. This patch was sent in February and but there were no comments: https://patchwork.sourceware.org/project/gcc/patch/eca6660c-6578-4e39-8aa9-be9fdd013...@126.com/ -- Best regards, LIU Hao

[PATCH] gcc: For Windows x86-32, always attempt to realign stack regardless of SSE

2025-04-27 Thread LIU Hao
, it's always necessary to realign the stack, as what Solaris does. Reference: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=07#c14 Signed-off-by: LIU Hao gcc/ChangeLog: PR target/07 * config/i386/cygming.h (STACK_REALIGN_DEFAULT): Copy from sol2.h. --- gcc/config

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-25 Thread Hongtao Liu
> > I am not so sure about this when it come to relatively common > instructions. Hiding things in unspec prevents combine and other RTL > passes from doing their job. I would say that it only makes sense for > siutations where RTL equivalent is very inconvenient. > In the direction of using gener

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-24 Thread Hongtao Liu
On Fri, Apr 25, 2025 at 1:26 PM Jan Hubicka wrote: > > > On Thu, Apr 24, 2025 at 6:27 PM Jan Hubicka wrote: > > > > > > > Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand > > > > or vpandn. > > > > Current register_operand/vector_operand could lose some optimization > > >

RE: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-24 Thread Liu, Hongtao
> -Original Message- > From: Jan Hubicka > Sent: Friday, April 25, 2025 12:27 AM > To: Liu, Hongtao > Cc: gcc-patches@gcc.gnu.org; crazy...@gmail.com; hjl.to...@gmail.com > Subject: Re: [PATCH] Accept allones or 0 operand for vcond_mask op1. > > > Since

Re: [PATCH] [x86] Generate 2 FMA instructions in ix86_expand_swdivsf.

2025-04-23 Thread Hongtao Liu
On Thu, Apr 24, 2025 at 12:54 AM Jan Hubicka wrote: > > > From: "hongtao.liu" > > > > When FMA is available, N-R step can be rewritten with > > > > a / b = (a - (rcp(b) * a * b)) * rcp(b) + rcp(b) * a > > > > which have 2 fma generated.[1] > > > > [1] https://bugs.llvm.org/show_bug.cgi?id=21385 >

Re: [PATCH] Consider frequency in cost estimation when converting scalar to vector.

2025-04-23 Thread Hongtao Liu
On Thu, Apr 24, 2025 at 12:50 AM Jan Hubicka wrote: > > > In some benchmark, I notice stv failed due to cost unprofitable, but the > > igain > > is inside the loop, but sse<->integer conversion is outside the loop, > > current cost > > model doesn't consider the frequency of those gain/cost. > >

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-22 Thread Hongtao Liu
On Mon, Apr 21, 2025 at 2:52 PM liuhongt wrote: > > Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand > or vpandn. > Current register_operand/vector_operand could lose some optimization > opportunity. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for tru

Re: Improve vectorizer costs of min, max, abs, absu and const_expr on x86

2025-04-21 Thread Hongtao Liu
On Tue, Apr 22, 2025 at 10:30 AM Hongtao Liu wrote: > > On Tue, Apr 22, 2025 at 12:46 AM Jan Hubicka wrote: > > > > Hi, > > this patch adds special cases for vectorizer costs in COND_EXPR, MIN_EXPR, > > MAX_EXPR, ABS_EXPR and ABSU_EXPR. We previously costed ABS_E

Re: Improve vectorizer costs of min, max, abs, absu and const_expr on x86

2025-04-21 Thread Hongtao Liu
On Tue, Apr 22, 2025 at 12:46 AM Jan Hubicka wrote: > > Hi, > this patch adds special cases for vectorizer costs in COND_EXPR, MIN_EXPR, > MAX_EXPR, ABS_EXPR and ABSU_EXPR. We previously costed ABS_EXPR and > ABSU_EXPR > but it was only correct for FP variant (wehre it corresponds to andss clea

Re: PING: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-21 Thread Hongtao Liu
On Mon, Apr 21, 2025 at 4:30 PM H.J. Lu wrote: > > On Mon, Apr 21, 2025 at 11:29 AM Hongtao Liu wrote: > > > > On Sat, Apr 19, 2025 at 1:25 PM H.J. Lu wrote: > > > > > > On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote: > > > > > > > >

Re: PING: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-20 Thread Hongtao Liu
On Sat, Apr 19, 2025 at 1:25 PM H.J. Lu wrote: > > On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote: > > > > For all different modes of all 0s/1s vectors, we can use the single widest > > all 0s/1s vector register for all 0s/1s vector uses in the whole function. > > Add a pass to generate a single wi

Re: [PATCH v2] x86: Update memcpy/memset inline strategies for -mtune=generic

2025-04-17 Thread Hongtao Liu
On Tue, Apr 8, 2025 at 3:52 AM H.J. Lu wrote: > > Simplify memcpy and memset inline strategies to avoid branches for > -mtune=generic: > > 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector >load and store for up to 16 * 16 (256) bytes when the data size is >fixed and kn

Re: [PATCH] [PR119765] testsuite: adjust amd64-abi-9.c to check both ms and sysv ABIs

2025-04-15 Thread LIU Hao
sysv abi, the argument should go in esi +/* { dg-final { scan-assembler-times "movl\[\\t \]*\\\$20,\[\\t \[]*%esi" 2 } } */ + + ditto. -- Best regards, LIU Hao OpenPGP_signature.asc Description: OpenPGP digital signature

Re: [PATCH] APX: Don't use red-zone with APX and no caller-saved registers

2025-04-14 Thread Hongtao Liu
On Mon, Apr 14, 2025 at 8:56 PM H.J. Lu wrote: > > On Mon, Apr 14, 2025 at 2:39 AM Uros Bizjak wrote: > > > > On Mon, Apr 14, 2025 at 8:54 AM Hongtao Liu wrote: > > > > > > On Mon, Apr 14, 2025 at 7:36 AM H.J. Lu wrote: > > > > > >

Re: [PATCH] APX: Don't use red-zone with APX and no caller-saved registers

2025-04-13 Thread Hongtao Liu
On Mon, Apr 14, 2025 at 7:36 AM H.J. Lu wrote: > > Don't use red-zone when there are no caller-saved registers and APX is > enabled since 128-byte red-zone is too small for 31 GPRs. > > gcc/ > > PR target/119784 > * config/i386/i386.cc (ix86_using_red_zone): Don't use red-zone >

RE: [PATCH] APX: add nf counterparts for rotl split pattern [PR 119539]

2025-04-05 Thread Liu, Hongtao
> -Original Message- > From: Uros Bizjak > Sent: Tuesday, April 1, 2025 5:24 PM > To: Hongtao Liu > Cc: Wang, Hongyu ; gcc-patches@gcc.gnu.org; Liu, > Hongtao > Subject: Re: [PATCH] APX: add nf counterparts for rotl split pattern [PR > 119539] > > O

Re: [PATCH] target/119549 - fixup handling of -mno-sse4

2025-04-04 Thread Hongtao Liu
On Mon, Mar 31, 2025 at 9:52 PM Richard Biener wrote: > > On Mon, 31 Mar 2025, Jakub Jelinek wrote: > > > On Mon, Mar 31, 2025 at 03:33:34PM +0200, Richard Biener wrote: > > > On Mon, 31 Mar 2025, Jakub Jelinek wrote: > > > > > > > On Mon, Mar 31, 2025 at 03:12:56PM +0200, Richard Biener wrote: >

Re: [PATCH] APX: add nf counterparts for rotl split pattern [PR 119539]

2025-04-02 Thread Hongtao Liu
On Wed, Apr 2, 2025 at 2:58 PM Hongyu Wang wrote: > > > Can we just change the output in original pattern, I think combine > > will still match the pattern even w/ clobber flags. > > Yes, adjusted and updated the patch in attachment. Ok. > > Liu, Ho

Re: [PATCH] APX: add nf counterparts for rotl split pattern [PR 119539]

2025-04-01 Thread Hongtao Liu
On Tue, Apr 1, 2025 at 4:40 PM Hongyu Wang wrote: > > Hi, > > For spiltter after 3_mask it now splits the pattern > to *3_mask, causing the splitter doesn't generate > nf variant. Add corresponding nf counterpart for define_insn_and_split > to make the splitter also works for nf insn. > > Bootstra

Re: [PATCH] target/119549 - fixup handling of -mno-sse4

2025-04-01 Thread Hongtao Liu
On Tue, Apr 1, 2025 at 3:56 PM Jakub Jelinek wrote: > > On Tue, Apr 01, 2025 at 01:36:23PM +0800, Hongtao Liu wrote: > > >Changing ix86_valid_target_attribute_inner_p might be even better because > > >OPT_msse4 is RejectNegative option, so !value for it looks weird.

Re: [PATCH] i386: Add attr_isa for vaes patterns to sync with attr gpr16. [pr119473]

2025-03-30 Thread Hongtao Liu
On Fri, Mar 28, 2025 at 1:55 PM Hu, Lin1 wrote: > > For vaes patterns with jm constraint and gpr16 attr, it requires "isa" > attr to distinct avx/avx512 alternatives in ix86_memory_address_reg_class. > Also adds missing type and mode attributes for those vaes patterns. Ok. > > gcc/ChangeLog: > >

Re: [PATCH] i386: Add PTA_AVX10_1_256 to PTA_DIAMONDRAPIDS

2025-03-30 Thread Hongtao Liu
On Fri, Mar 28, 2025 at 4:22 PM Haochen Jiang wrote: > > Hi all, > > For -march= handling, PTA_AVX10_1 will not imply PTA_AVX10_1_256, > resulting in TARGET_AVX10_1 becoming true while TARGET_AVX10_1_256 > false. Since we will check TARGET_AVX10_1_256 in GCC 15 for AVX512 > feature enabling for AV

[PATCH] gcc/mingw: Align `.refptr.` to 8-byte boundaries for 64-bit targets

2025-03-29 Thread LIU Hao
This is a minor change, bootstrapped on x86_64-w64-mingw32. -- Best regards, LIU Hao From 83c3e90432f9ebc97785d81be7a94066d9923920 Mon Sep 17 00:00:00 2001 From: LIU Hao Date: Sat, 29 Mar 2025 22:47:54 +0800 Subject: [PATCH] gcc/mingw: Align `.refptr.` to 8-byte boundaries for 64-bit targets

Re: [PATCH] i386: Set attr "addr" as "gpr16" for constraint "jm". [PR 119425]

2025-03-26 Thread Hongtao Liu
On Wed, Mar 26, 2025 at 9:50 AM Hu, Lin1 wrote: > > Hi, all > > This patch aims to ensure each alternative with constraint "jm" should > set addr "gpr16", otherwise maybe raise ICE in reload pass. > > Bootstrapped and Regtested for x86_64-pc-linux-gnu{-m32,-m64}, ok for trunk? Ok. > > BRs, > Lin >

RE: [PATCH v2] i386: Add "s_" as Saturation for AVX10.2 Converting Intrinsics.

2025-03-25 Thread Liu, Hongtao
> -Original Message- > From: Hu, Lin1 > Sent: Tuesday, March 25, 2025 4:23 PM > To: gcc-patches@gcc.gnu.org > Cc: Liu, Hongtao ; ubiz...@gmail.com > Subject: RE: [PATCH v2] i386: Add "s_" as Saturation for AVX10.2 Converting > Intrinsics. > > Mor

Re: [PATCH] i386: Fix AVX10.2 SAT CVT testcases.

2025-03-20 Thread Hongtao Liu
On Thu, Mar 20, 2025 at 3:14 PM Hu, Lin1 wrote: > > Hi, > > res_ref will be modified after MASK_ZERO, init res_ref2 for rounding > control intrinsics. > > Bootstrapped and regtested on x86-64-pc-linux-gnu{-m32,-m64}, OK for trunk? Ok. > > BRs, > Lin > > gcc/testsuite/ChangeLog: > > * gcc.t

RE: [PATCH 0/4] Fix AVX10.2 SAT CVT.

2025-03-19 Thread Liu, Hongtao
> -Original Message- > From: Liu, Hongtao > Sent: Thursday, March 20, 2025 9:29 AM > To: Hu, Lin1 ; gcc-patches@gcc.gnu.org > Cc: ubiz...@gmail.com > Subject: RE: [PATCH 0/4] Fix AVX10.2 SAT CVT. > > > > > -Original Message- > > From:

RE: [PATCH 00/27] Use avx10.x as the only option for AVX10 with 512 bit vector support while remove avx10.x-256/512 option and 256 bit rounding support

2025-03-19 Thread Liu, Hongtao
> -Original Message- > From: Jiang, Haochen > Sent: Wednesday, March 19, 2025 3:38 PM > To: gcc-patches@gcc.gnu.org > Cc: Liu, Hongtao ; ubiz...@gmail.com > Subject: [PATCH 00/27] Use avx10.x as the only option for AVX10 with 512 bit > vector support while remove a

  1   2   3   4   5   6   7   8   9   10   >