Re: PING: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-21 Thread Hongtao Liu
On Mon, Apr 21, 2025 at 4:30 PM H.J. Lu wrote: > > On Mon, Apr 21, 2025 at 11:29 AM Hongtao Liu wrote: > > > > On Sat, Apr 19, 2025 at 1:25 PM H.J. Lu wrote: > > > > > > On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote: > > > > > > > > For all different modes of all 0s/1s vectors, we can use the si

Re: PING: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-21 Thread H.J. Lu
On Mon, Apr 21, 2025 at 11:29 AM Hongtao Liu wrote: > > On Sat, Apr 19, 2025 at 1:25 PM H.J. Lu wrote: > > > > On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote: > > > > > > For all different modes of all 0s/1s vectors, we can use the single widest > > > all 0s/1s vector register for all 0s/1s vector

Re: PING: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-20 Thread Hongtao Liu
On Sat, Apr 19, 2025 at 1:25 PM H.J. Lu wrote: > > On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote: > > > > For all different modes of all 0s/1s vectors, we can use the single widest > > all 0s/1s vector register for all 0s/1s vector uses in the whole function. > > Add a pass to generate a single wi

PING: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-18 Thread H.J. Lu
On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote: > > For all different modes of all 0s/1s vectors, we can use the single widest > all 0s/1s vector register for all 0s/1s vector uses in the whole function. > Add a pass to generate a single widest all 0s/1s vector set instruction at > entry of the near

[PATCH] x86: Add preserve_none and update no_caller_saved_registers attributes

2025-04-18 Thread H.J. Lu
Add preserve_none attribute which is similar to no_callee_saved_registers attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are used for integer parameter passing. This can be used in an interpreter to avoid saving/restoring the registers in functions which processing byte cod

[PATCH] x86: Add tests for PR tree-optimization/82142

2025-02-23 Thread H.J. Lu
J. Lu" Date: Mon, 24 Feb 2025 05:44:40 +0800 Subject: [PATCH] x86: Add tests for PR tree-optimization/82142 Verify that PR tree-optimization/82142 testcase is properly optimized. PR tree-optimization/82142 * gcc.target/i386/pr82142a.c: New file. * gcc.target/i386/pr82142b.c: Likewise. Sig

Re: [PATCH] x86: Add a -mstack-protector-guard=global test

2025-02-01 Thread Uros Bizjak
On Sat, Feb 1, 2025 at 11:14 AM H.J. Lu wrote: > > Verify that -mstack-protector-guard=global works on x86. Default stack > protector uses TLS. -mstack-protector-guard=global uses a global variable, > __stack_chk_guard, instead of TLS. > > * gcc.target/i386/ssp-global.c: New file. OK. Thanks,

[PATCH] x86: Add a -mstack-protector-guard=global test

2025-02-01 Thread H.J. Lu
:00:00 2001 From: "H.J. Lu" Date: Sat, 1 Feb 2025 18:06:33 +0800 Subject: [PATCH] x86: Add a -mstack-protector-guard=global test Verify that -mstack-protector-guard=global works on x86. Default stack protector uses TLS. -mstack-protector-guard=global uses a global variable, __stack

Re: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2024-12-01 Thread H.J. Lu
On Mon, Dec 2, 2024, 11:16 AM Hongtao Liu wrote: > On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote: > > > > For all different modes of all 0s/1s vectors, we can use the single > widest > > all 0s/1s vector register for all 0s/1s vector uses in the whole > function. > > Add a pass to generate a sing

Re: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2024-12-01 Thread Hongtao Liu
On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote: > > For all different modes of all 0s/1s vectors, we can use the single widest > all 0s/1s vector register for all 0s/1s vector uses in the whole function. > Add a pass to generate a single widest all 0s/1s vector set instruction at > entry of the near

Re: [PATCH] x86: Add pcmpeq splitters

2024-12-01 Thread H.J. Lu
On Sun, Dec 1, 2024 at 8:01 PM Uros Bizjak wrote: > > On Sat, Nov 30, 2024 at 11:00 PM H.J. Lu wrote: > > > > Add pcmpeq splitters to split > > > > (insn 5 3 7 2 (set (reg:V4SI 100) > > (eq:V4SI (reg:V4SI 98) > > (reg:V4SI 98))) 7910 {*sse2_eqv4si3} > > (expr_list:REG_DEA

Re: [PATCH] x86: Add pcmpeq splitters

2024-12-01 Thread Uros Bizjak
On Sat, Nov 30, 2024 at 11:00 PM H.J. Lu wrote: > > Add pcmpeq splitters to split > > (insn 5 3 7 2 (set (reg:V4SI 100) > (eq:V4SI (reg:V4SI 98) > (reg:V4SI 98))) 7910 {*sse2_eqv4si3} > (expr_list:REG_DEAD (reg:V4SI 98) > (expr_list:REG_EQUAL (eq:V4SI (const_vector

[PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2024-11-30 Thread H.J. Lu
For all different modes of all 0s/1s vectors, we can use the single widest all 0s/1s vector register for all 0s/1s vector uses in the whole function. Add a pass to generate a single widest all 0s/1s vector set instruction at entry of the nearest common dominator for basic blocks with all 0s/1s vect

[PATCH] x86: Add pcmpeq splitters

2024-11-30 Thread H.J. Lu
Add pcmpeq splitters to split (insn 5 3 7 2 (set (reg:V4SI 100) (eq:V4SI (reg:V4SI 98) (reg:V4SI 98))) 7910 {*sse2_eqv4si3} (expr_list:REG_DEAD (reg:V4SI 98) (expr_list:REG_EQUAL (eq:V4SI (const_vector:V4SI [ (const_int -1 [0xfff

Re: [PATCH] [x86] Add some preference for floating point rtl ifcvt when sse4.1 is not available

2024-06-03 Thread Uros Bizjak
On Mon, Jun 3, 2024 at 5:11 AM liuhongt wrote: > > W/o TARGET_SSE4_1, it takes 3 instructions (pand, pandn and por) for > movdfcc/movsfcc, and could possibly fail cost comparison. Increase > branch cost could hurt performance for other modes, so specially add > some preference for floating point i

[PATCH] [x86] Add some preference for floating point rtl ifcvt when sse4.1 is not available

2024-06-02 Thread liuhongt
W/o TARGET_SSE4_1, it takes 3 instructions (pand, pandn and por) for movdfcc/movsfcc, and could possibly fail cost comparison. Increase branch cost could hurt performance for other modes, so specially add some preference for floating point ifcvt. Bootstrapped and regtested on x86_64-pc-linux-gnu{-

[PATCH] x86: Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-14 Thread Levy Hsu
Hi All We've introduced a new subroutine in ix86_expand_vec_perm_const_1 to optimize vector shifting for the V16QI type on x86. This patch uses a three-instruction sequence psrlw, psllw, and por to handle specific vector shuffle operations more efficiently. The change aims to improve assembly code

Re: [PATCH 1/1] [PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-13 Thread Uros Bizjak
On Thu, May 9, 2024 at 11:12 AM Levy Hsu wrote: > > Hi All > > We've introduced a new subroutine in ix86_expand_vec_perm_const_1 > to optimize vector shifting for the V16QI type on x86. > This patch uses a three-instruction sequence psrlw, psllw, and por > to handle specific vector shuffle operati

[PATCH 1/1] [PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-09 Thread Levy Hsu
Hi All We've introduced a new subroutine in ix86_expand_vec_perm_const_1 to optimize vector shifting for the V16QI type on x86. This patch uses a three-instruction sequence psrlw, psllw, and por to handle specific vector shuffle operations more efficiently. The change aims to improve assembly code

Re: [PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-08 Thread Uros Bizjak
On Wed, May 8, 2024 at 4:44 AM Levy Hsu wrote: > > PR target/107563 > > gcc/ChangeLog: > > * config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New > subroutine. > (ix86_expand_vec_perm_const_1): New Entry. > > gcc/testsuite/ChangeLog: > > * g++.t

[PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-07 Thread Levy Hsu
Hi All We've introduced a new subroutine in ix86_expand_vec_perm_const_1 to optimize vector shifting for the V16QI type on x86. This patch uses a three-instruction sequence psrlw, psllw, and por to handle specific vector shuffle operations more efficiently. The change aims to improve assembly c

[PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-07 Thread Levy Hsu
PR target/107563 gcc/ChangeLog: * config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New subroutine. (ix86_expand_vec_perm_const_1): New Entry. gcc/testsuite/ChangeLog: * g++.target/i386/pr107563.C: New test. --- gcc/config/i386/i386-expand.cc

[PATCH] [x86] Add UNSPEC_MASKOP to vpbroadcastm pattern.

2023-07-27 Thread liuhongt via Gcc-patches
Prevent rtl optimization of vec_duplicate + zero_extend to vpbroadcastm since there could be an extra kmov after RA. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} Ready to push to trunk. gcc/ChangeLog: PR target/110788 * config/i386/sse.md (avx512cd_maskb_vec_dup): Add

[PATCH] x86: add -mprefer-vector-width=512 to new avx512f-dupv2di.c testcase

2023-06-20 Thread Jan Beulich via Gcc-patches
This is to cover testing also being done with -march=cascadelake. --- Committing as obvious. --- a/gcc/testsuite/gcc.target/i386/avx512f-dupv2di.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-dupv2di.c @@ -1,5 +1,5 @@ /* { dg-do compile { target { ! ia32 } } } */ -/* { dg-options "-mavx512f -mno-a

Re: [PATCH] x86: add Bk and Br to comment list B's sub-chars

2023-06-13 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 14, 2023 at 1:56 PM Jan Beulich via Gcc-patches wrote: > > gcc/ > > * config/i386/constraints.md: Mention k and r for B. Ok. > > --- a/gcc/config/i386/constraints.md > +++ b/gcc/config/i386/constraints.md > @@ -162,7 +162,9 @@ > ;; g GOT memory operand. > ;; m Vector memo

[PATCH] x86: add Bk and Br to comment list B's sub-chars

2023-06-13 Thread Jan Beulich via Gcc-patches
gcc/ * config/i386/constraints.md: Mention k and r for B. --- a/gcc/config/i386/constraints.md +++ b/gcc/config/i386/constraints.md @@ -162,7 +162,9 @@ ;; g GOT memory operand. ;; m Vector memory operand ;; c Constant memory operand +;; k TLS address that allows insn using non-

Re: [PATCH] [x86] Add missing vec_pack/unpacks patterns for _Float16 <-> int/float conversion.

2023-06-12 Thread Hongtao Liu via Gcc-patches
On Mon, Jun 5, 2023 at 9:26 AM liuhongt wrote: > > This patch only support vec_pack/unpacks optabs for vector modes whose lenth > >= 128. > For 32/64-bit vector, they're more hanlded by BB vectorizer with > truncmn2/extendmn2/fix{,uns}_truncmn2. > > Bootstrapped and regtested on x86_64-pc-linux-g

[PATCH] [x86] Add missing vec_pack/unpacks patterns for _Float16 <-> int/float conversion.

2023-06-04 Thread liuhongt via Gcc-patches
This patch only support vec_pack/unpacks optabs for vector modes whose lenth >= 128. For 32/64-bit vector, they're more hanlded by BB vectorizer with truncmn2/extendmn2/fix{,uns}_truncmn2. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready to push to trunk. gcc/ChangeLog: *

Re: [PATCH] x86: Add a new option -mdaz-ftz to enable FTZ and DAZ flags in MXCSR.

2023-05-14 Thread Hongtao Liu via Gcc-patches
On Fri, May 12, 2023 at 1:43 PM Hongtao Liu wrote: > > On Wed, May 10, 2023 at 5:10 PM liuhongt wrote: > > > > > The quoted patch shows -shared in context and you didn't post a > > > backport version > > > to look at. But yes, we shouldn't change -shared behavior on a > > > branch, even less so

Re: [PATCH] x86: Add a new option -mdaz-ftz to enable FTZ and DAZ flags in MXCSR.

2023-05-11 Thread Hongtao Liu via Gcc-patches
On Wed, May 10, 2023 at 5:10 PM liuhongt wrote: > > > The quoted patch shows -shared in context and you didn't post a > > backport version > > to look at. But yes, we shouldn't change -shared behavior on a > > branch, even less so make it > > inconsistent between targets. > Here's the patch. > >

[PATCH] x86: Add a new option -mdaz-ftz to enable FTZ and DAZ flags in MXCSR.

2023-05-10 Thread liuhongt via Gcc-patches
> The quoted patch shows -shared in context and you didn't post a > backport version > to look at. But yes, we shouldn't change -shared behavior on a > branch, even less so make it > inconsistent between targets. Here's the patch. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for

RE: [PATCH] [x86] Add define_insn_and_split to support general version of "kxnor".

2022-10-11 Thread Liu, Hongtao via Gcc-patches
> -Original Message- > From: Jakub Jelinek > Sent: Tuesday, October 11, 2022 9:59 PM > To: Liu, Hongtao > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] [x86] Add define_insn_and_split to support general > version of "kxnor". > > On Tue, Oct 1

Re: [PATCH] [x86] Add define_insn_and_split to support general version of "kxnor".

2022-10-11 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 11, 2022 at 04:03:16PM +0800, liuhongt via Gcc-patches wrote: > gcc/ChangeLog: > > * config/i386/i386.md (*notxor_1): New post_reload > define_insn_and_split. > (*notxorqi_1): Ditto. > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -10826,6 +10826

Re: [PATCH] [x86] Add define_insn_and_split to support general version of "kxnor".

2022-10-11 Thread Uros Bizjak via Gcc-patches
On Tue, Oct 11, 2022 at 10:03 AM liuhongt wrote: > > For genereal_reg_operand, it will be splitted into xor + not. > For mask_reg_operand, it will be splitted with UNSPEC_MASK_OP just > like what we did for other logic operations. > > The patch will optimize xor+not to kxnor when possible. > > Boo

[PATCH] [x86] Add define_insn_and_split to support general version of "kxnor".

2022-10-11 Thread liuhongt via Gcc-patches
For genereal_reg_operand, it will be splitted into xor + not. For mask_reg_operand, it will be splitted with UNSPEC_MASK_OP just like what we did for other logic operations. The patch will optimize xor+not to kxnor when possible. Bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for trunk? g

Re: PING [PATCH] x86: Add ix86_ifunc_ref_local_ok

2022-07-31 Thread Uros Bizjak via Gcc-patches
On Wed, Jul 27, 2022 at 4:47 PM H.J. Lu wrote: > > On Thu, Jul 21, 2022 at 11:53 AM H.J. Lu wrote: > > > > We can't always use the PLT entry as the function address for local IFUNC > > functions. When the PIC register is needed for PLT call, indirect call > > via the PLT entry will fail since th

PING [PATCH] x86: Add ix86_ifunc_ref_local_ok

2022-07-27 Thread H.J. Lu via Gcc-patches
On Thu, Jul 21, 2022 at 11:53 AM H.J. Lu wrote: > > We can't always use the PLT entry as the function address for local IFUNC > functions. When the PIC register is needed for PLT call, indirect call > via the PLT entry will fail since the PIC register may not be set up > properly for indirect cal

[PATCH] x86: Add ix86_ifunc_ref_local_ok

2022-07-21 Thread H.J. Lu via Gcc-patches
We can't always use the PLT entry as the function address for local IFUNC functions. When the PIC register is needed for PLT call, indirect call via the PLT entry will fail since the PIC register may not be set up properly for indirect call. Add ix86_ifunc_ref_local_ok to return false when the PL

Re: [PATCH] x86: Add .note.GNU-stack section only for Linux

2022-05-10 Thread H.J. Lu via Gcc-patches
On Mon, May 9, 2022 at 7:51 AM H.J. Lu wrote: > > Add .note.GNU-stack section only for Linux since it may not be supported > on non-Linux OSes. __ELF__ isn't checked since these tests can only run > on Linux/x86 ELF systems. > > PR target/105472 > * gcc.target/i386/iamcu/asm-suppo

[PATCH] x86: Add .note.GNU-stack section only for Linux

2022-05-09 Thread H.J. Lu via Gcc-patches
Add .note.GNU-stack section only for Linux since it may not be supported on non-Linux OSes. __ELF__ isn't checked since these tests can only run on Linux/x86 ELF systems. PR target/105472 * gcc.target/i386/iamcu/asm-support.S: Add .note.GNU-stack section only for Linux.

Re: [PATCH] x86: Add missing .note.GNU-stack to assembly source

2022-05-06 Thread Rainer Orth
Hi H.J, > On Mon, May 2, 2022 at 11:37 AM H.J. Lu wrote: >> >> On Fri, Apr 29, 2022 at 10:38 AM H.J. Lu wrote: >> > >> > Add .note.GNU-stack assembly source to avoid linker warning: >> > >> > ld: warning: /tmp/ccPZSZ7Z.o: missing .note.GNU-stack section implies >> > executable stack >> > ld: NO

Re: [PATCH] x86: Add missing .note.GNU-stack to assembly source

2022-05-06 Thread H.J. Lu via Gcc-patches
On Mon, May 2, 2022 at 11:37 AM H.J. Lu wrote: > > On Fri, Apr 29, 2022 at 10:38 AM H.J. Lu wrote: > > > > Add .note.GNU-stack assembly source to avoid linker warning: > > > > ld: warning: /tmp/ccPZSZ7Z.o: missing .note.GNU-stack section implies > > executable stack > > ld: NOTE: This behaviour

Re: [PATCH] x86: Add missing .note.GNU-stack to assembly source

2022-05-02 Thread H.J. Lu via Gcc-patches
On Fri, Apr 29, 2022 at 10:38 AM H.J. Lu wrote: > > Add .note.GNU-stack assembly source to avoid linker warning: > > ld: warning: /tmp/ccPZSZ7Z.o: missing .note.GNU-stack section implies > executable stack > ld: NOTE: This behaviour is deprecated and will be removed in a future > version of the

[PATCH] x86: Add missing .note.GNU-stack to assembly source

2022-04-29 Thread H.J. Lu via Gcc-patches
Add .note.GNU-stack assembly source to avoid linker warning: ld: warning: /tmp/ccPZSZ7Z.o: missing .note.GNU-stack section implies executable stack ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker FAIL: gcc.target/i386/iamcu/test_3_element_struct_and_u

Re: [PATCH] x86: Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER

2022-02-17 Thread H.J. Lu via Gcc-patches
On Thu, Feb 17, 2022 at 10:49:48AM +0100, Richard Biener via Gcc-patches wrote: > On Thu, Feb 17, 2022 at 8:52 AM Uros Bizjak via Gcc-patches > wrote: > > > > On Thu, Feb 17, 2022 at 6:25 AM Hongtao Liu via Gcc-patches > > wrote: > > > > > > On Thu, Feb 17, 2022 at 12:26 PM H.J. Lu via Gcc-patche

Re: [PATCH] x86: Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER

2022-02-17 Thread Richard Biener via Gcc-patches
On Thu, Feb 17, 2022 at 8:52 AM Uros Bizjak via Gcc-patches wrote: > > On Thu, Feb 17, 2022 at 6:25 AM Hongtao Liu via Gcc-patches > wrote: > > > > On Thu, Feb 17, 2022 at 12:26 PM H.J. Lu via Gcc-patches > > wrote: > > > > > > Reading YMM registers with all zero bits needs VZEROUPPER on Sandy B

Re: [PATCH] x86: Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER

2022-02-16 Thread Uros Bizjak via Gcc-patches
On Thu, Feb 17, 2022 at 6:25 AM Hongtao Liu via Gcc-patches wrote: > > On Thu, Feb 17, 2022 at 12:26 PM H.J. Lu via Gcc-patches > wrote: > > > > Reading YMM registers with all zero bits needs VZEROUPPER on Sandy Bride, > > Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid SSE <-> AVX > > tra

Re: [PATCH] x86: Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER

2022-02-16 Thread Hongtao Liu via Gcc-patches
On Thu, Feb 17, 2022 at 12:26 PM H.J. Lu via Gcc-patches wrote: > > Reading YMM registers with all zero bits needs VZEROUPPER on Sandy Bride, > Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid SSE <-> AVX > transition penalty. Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER to > generate vzero

[PATCH] x86: Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER

2022-02-16 Thread H.J. Lu via Gcc-patches
Reading YMM registers with all zero bits needs VZEROUPPER on Sandy Bride, Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid SSE <-> AVX transition penalty. Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER to generate vzeroupper instruction after loading all-zero YMM/YMM registers and enable it by

Re: PING^1 [PATCH] x86: Add -mmove-max=bits and -mstore-max=bits

2021-12-03 Thread H.J. Lu via Gcc-patches
On Fri, Dec 3, 2021 at 8:55 AM Uros Bizjak wrote: > > On Fri, Dec 3, 2021 at 2:24 PM H.J. Lu wrote: > > > > On Thu, Nov 25, 2021 at 2:47 PM H.J. Lu wrote: > > > > > > Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move > > > and store, independent of -mprefer-vector-width=bit

Re: PING^1 [PATCH] x86: Add -mmove-max=bits and -mstore-max=bits

2021-12-03 Thread Uros Bizjak via Gcc-patches
On Fri, Dec 3, 2021 at 2:24 PM H.J. Lu wrote: > > On Thu, Nov 25, 2021 at 2:47 PM H.J. Lu wrote: > > > > Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move > > and store, independent of -mprefer-vector-width=bits: > > > > 1. Add X86_TUNE_AVX512_MOVE_BY_PIECES and X86_TUNE_AVX

PING^1 [PATCH] x86: Add -mmove-max=bits and -mstore-max=bits

2021-12-03 Thread H.J. Lu via Gcc-patches
On Thu, Nov 25, 2021 at 2:47 PM H.J. Lu wrote: > > Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move > and store, independent of -mprefer-vector-width=bits: > > 1. Add X86_TUNE_AVX512_MOVE_BY_PIECES and X86_TUNE_AVX512_STORE_BY_PIECES > which are enabled for Intel Sapphire Ra

[PATCH] x86: Add -mmove-max=bits and -mstore-max=bits

2021-11-25 Thread H.J. Lu via Gcc-patches
Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move and store, independent of -mprefer-vector-width=bits: 1. Add X86_TUNE_AVX512_MOVE_BY_PIECES and X86_TUNE_AVX512_STORE_BY_PIECES which are enabled for Intel Sapphire Rapids processor. 2. Add -mmove-max=bits to set the maximum n

Re: [PATCH] x86: Add -mharden-sls=[none|all|return|indirect-branch]

2021-11-17 Thread H.J. Lu via Gcc-patches
On Wed, Nov 17, 2021 at 6:08 AM Uros Bizjak wrote: > > On Wed, Nov 17, 2021 at 2:46 PM H.J. Lu wrote: > > > > On Wed, Nov 17, 2021 at 1:05 AM Uros Bizjak wrote: > > > > > > On Tue, Nov 16, 2021 at 7:20 PM H.J. Lu via Gcc-patches > > > wrote: > > > > > > > > Add -mharden-sls= to mitigate against

Re: [PATCH] x86: Add -mharden-sls=[none|all|return|indirect-branch]

2021-11-17 Thread Uros Bizjak via Gcc-patches
On Wed, Nov 17, 2021 at 2:46 PM H.J. Lu wrote: > > On Wed, Nov 17, 2021 at 1:05 AM Uros Bizjak wrote: > > > > On Tue, Nov 16, 2021 at 7:20 PM H.J. Lu via Gcc-patches > > wrote: > > > > > > Add -mharden-sls= to mitigate against straight line speculation (SLS) > > > for function return and indirec

Re: [PATCH] x86: Add -mindirect-branch-cs-prefix

2021-11-17 Thread H.J. Lu via Gcc-patches
On Wed, Nov 17, 2021 at 1:10 AM Uros Bizjak wrote: > > On Tue, Nov 16, 2021 at 7:51 PM H.J. Lu via Gcc-patches > wrote: > > > > Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to thunk > > via r8-r15 registers when converting indirect call and jump to increase > > the instruction

Re: [PATCH] x86: Add -mharden-sls=[none|all|return|indirect-branch]

2021-11-17 Thread H.J. Lu via Gcc-patches
On Wed, Nov 17, 2021 at 1:05 AM Uros Bizjak wrote: > > On Tue, Nov 16, 2021 at 7:20 PM H.J. Lu via Gcc-patches > wrote: > > > > Add -mharden-sls= to mitigate against straight line speculation (SLS) > > for function return and indirect branch by adding an INT3 instruction > > after function return

Re: [PATCH] x86: Add -mindirect-branch-cs-prefix

2021-11-17 Thread Uros Bizjak via Gcc-patches
On Tue, Nov 16, 2021 at 7:51 PM H.J. Lu via Gcc-patches wrote: > > Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to thunk > via r8-r15 registers when converting indirect call and jump to increase > the instruction length to 6, allowing the non-thunk form to be inlined. > > gcc/

Re: [PATCH] x86: Add -mharden-sls=[none|all|return|indirect-branch]

2021-11-17 Thread Uros Bizjak via Gcc-patches
On Tue, Nov 16, 2021 at 7:20 PM H.J. Lu via Gcc-patches wrote: > > Add -mharden-sls= to mitigate against straight line speculation (SLS) > for function return and indirect branch by adding an INT3 instruction > after function return and indirect branch. > > gcc/ > > PR target/102952 >

[PATCH] x86: Add -mindirect-branch-cs-prefix

2021-11-16 Thread H.J. Lu via Gcc-patches
Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to thunk via r8-r15 registers when converting indirect call and jump to increase the instruction length to 6, allowing the non-thunk form to be inlined. gcc/ PR target/102952 * config/i386/i386.c (ix86_output_jmp_thu

[PATCH] x86: Add -mharden-sls=[none|all|return|indirect-branch]

2021-11-16 Thread H.J. Lu via Gcc-patches
Add -mharden-sls= to mitigate against straight line speculation (SLS) for function return and indirect branch by adding an INT3 instruction after function return and indirect branch. gcc/ PR target/102952 * config/i386/i386-opts.h (harden_sls): New enum. * config/i386/i386

Re: [PATCH] x86: Add gcc.target/i386/pr103205-2.c

2021-11-15 Thread Jakub Jelinek via Gcc-patches
On Mon, Nov 15, 2021 at 05:40:01AM -0800, H.J. Lu via Gcc-patches wrote: > PR target/103205 > * gcc.target/i386/pr103205-2.c: New test. Ok, thanks. Jakub

[PATCH] x86: Add gcc.target/i386/pr103205-2.c

2021-11-15 Thread H.J. Lu via Gcc-patches
PR target/103205 * gcc.target/i386/pr103205-2.c: New test. --- gcc/testsuite/gcc.target/i386/pr103205-2.c | 46 ++ 1 file changed, 46 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr103205-2.c diff --git a/gcc/testsuite/gcc.target/i386/pr10320

Re: [PATCH] X86: Add an option -muse-unaligned-vector-move

2021-10-21 Thread H.J. Lu via Gcc-patches
On Thu, Oct 21, 2021 at 12:15 AM Richard Biener wrote: > > On Wed, Oct 20, 2021 at 8:34 PM H.J. Lu wrote: > > > > On Wed, Oct 20, 2021 at 9:58 AM Richard Biener > > wrote: > > > > > > On October 20, 2021 3:19:28 PM GMT+02:00, "H.J. Lu" > > > wrote: > > > >On Wed, Oct 20, 2021 at 4:18 AM Richar

Re: [PATCH] X86: Add an option -muse-unaligned-vector-move

2021-10-21 Thread Richard Biener via Gcc-patches
On Wed, Oct 20, 2021 at 8:34 PM H.J. Lu wrote: > > On Wed, Oct 20, 2021 at 9:58 AM Richard Biener > wrote: > > > > On October 20, 2021 3:19:28 PM GMT+02:00, "H.J. Lu" > > wrote: > > >On Wed, Oct 20, 2021 at 4:18 AM Richard Biener > > > wrote: > > >> > > >> On Wed, Oct 20, 2021 at 12:40 PM Xu Di

Re: [PATCH] X86: Add an option -muse-unaligned-vector-move

2021-10-20 Thread H.J. Lu via Gcc-patches
On Wed, Oct 20, 2021 at 9:58 AM Richard Biener wrote: > > On October 20, 2021 3:19:28 PM GMT+02:00, "H.J. Lu" > wrote: > >On Wed, Oct 20, 2021 at 4:18 AM Richard Biener > > wrote: > >> > >> On Wed, Oct 20, 2021 at 12:40 PM Xu Dianhong wrote: > >> > > >> > Many thanks for your explanation. I got

Re: [PATCH] X86: Add an option -muse-unaligned-vector-move

2021-10-20 Thread Richard Biener via Gcc-patches
On October 20, 2021 3:19:28 PM GMT+02:00, "H.J. Lu" wrote: >On Wed, Oct 20, 2021 at 4:18 AM Richard Biener > wrote: >> >> On Wed, Oct 20, 2021 at 12:40 PM Xu Dianhong wrote: >> > >> > Many thanks for your explanation. I got the meaning of operands. >> > The "addpd b(%rip), %xmm0" instruction need

Re: [PATCH] X86: Add an option -muse-unaligned-vector-move

2021-10-20 Thread H.J. Lu via Gcc-patches
On Wed, Oct 20, 2021 at 4:18 AM Richard Biener wrote: > > On Wed, Oct 20, 2021 at 12:40 PM Xu Dianhong wrote: > > > > Many thanks for your explanation. I got the meaning of operands. > > The "addpd b(%rip), %xmm0" instruction needs "b(%rip)" aligned otherwise it > > will rise a "Real-Address Mod

Re: [PATCH] X86: Add an option -muse-unaligned-vector-move

2021-10-20 Thread Richard Biener via Gcc-patches
On Wed, Oct 20, 2021 at 12:40 PM Xu Dianhong wrote: > > Many thanks for your explanation. I got the meaning of operands. > The "addpd b(%rip), %xmm0" instruction needs "b(%rip)" aligned otherwise it > will rise a "Real-Address Mode Exceptions". > I haven't considered this situation "b(%rip)" has

Re: [PATCH] X86: Add an option -muse-unaligned-vector-move

2021-10-20 Thread Xu Dianhong via Gcc-patches
Many thanks for your explanation. I got the meaning of operands. The "addpd b(%rip), %xmm0" instruction needs "b(%rip)" aligned otherwise it will rise a "Real-Address Mode Exceptions". I haven't considered this situation "b(%rip)" has an address dependence of "a(%rip)" before. I think this situati

Re: [PATCH] X86: Add an option -muse-unaligned-vector-move

2021-10-20 Thread Richard Biener via Gcc-patches
On Wed, Oct 20, 2021 at 9:48 AM Xu Dianhong wrote: > > Thanks for the comments. > > > And does it even work? > It works, I checked it in the test case, and when using this option, it can > emit an unaligned vector move. > >I fail to see adjustments to memory operands of > SSE/AVX instructions tha

Re: [PATCH] X86: Add an option -muse-unaligned-vector-move

2021-10-20 Thread Xu Dianhong via Gcc-patches
Thanks for the comments. >Why would you ever want to have such option?! I need to ask @H. J. Lu for help to answer this question. He knows more about the background. I may not explain it clearly. >Should the documentation at least read "emit unaligned vector moves even for aligned storage or when

Re: [PATCH] X86: Add an option -muse-unaligned-vector-move

2021-10-20 Thread Xu Dianhong via Gcc-patches
Thanks for the comments. > And does it even work? It works, I checked it in the test case, and when using this option, it can emit an unaligned vector move. >I fail to see adjustments to memory operands of SSE/AVX instructions that have to be aligned I changed all vector move in "get_ssemov" witho

Re: [PATCH] X86: Add an option -muse-unaligned-vector-move

2021-10-20 Thread Richard Biener via Gcc-patches
On Wed, Oct 20, 2021 at 9:02 AM Richard Biener wrote: > > On Wed, Oct 20, 2021 at 7:31 AM dianhong.xu--- via Gcc-patches > wrote: > > > > From: dianhong xu > > > > Add -muse-unaligned-vector-move option to emit unaligned vector move > > instaructions. > > Why would you ever want to have such opt

Re: [PATCH] X86: Add an option -muse-unaligned-vector-move

2021-10-20 Thread Richard Biener via Gcc-patches
On Wed, Oct 20, 2021 at 7:31 AM dianhong.xu--- via Gcc-patches wrote: > > From: dianhong xu > > Add -muse-unaligned-vector-move option to emit unaligned vector move > instaructions. Why would you ever want to have such option?! Should the documentation at least read "emit unaligned vector moves

[PATCH] X86: Add an option -muse-unaligned-vector-move

2021-10-19 Thread dianhong.xu--- via Gcc-patches
From: dianhong xu Add -muse-unaligned-vector-move option to emit unaligned vector move instaructions. gcc/ChangeLog: * config/i386/i386-options.c (ix86_target_string): Add -muse-unaligned-vector-move. * config/i386/i386.c (ix86_get_ssemov): Emit unaligned vector if use

Re: [PATCH] x86: Add TARGET_AVX256_[MOVE|STORE]_BY_PIECES

2021-09-08 Thread Hongtao Liu via Gcc-patches
On Thu, Sep 9, 2021 at 11:21 AM H.J. Lu via Gcc-patches wrote: > > 1. Add TARGET_AVX256_MOVE_BY_PIECES to perform move by-pieces operation > with 256-bit AVX instructions. > 2. Add TARGET_AVX256_STORE_BY_PIECES to perform move and store by-pieces > operations with 256-bit AVX instructions. > > The

[PATCH] x86: Add TARGET_AVX256_[MOVE|STORE]_BY_PIECES

2021-09-08 Thread H.J. Lu via Gcc-patches
1. Add TARGET_AVX256_MOVE_BY_PIECES to perform move by-pieces operation with 256-bit AVX instructions. 2. Add TARGET_AVX256_STORE_BY_PIECES to perform move and store by-pieces operations with 256-bit AVX instructions. They are enabled only for Intel Alder Lake and Intel processors with AVX512. gc

Re: [PATCH] x86: Add non-destructive source to @xorsign3_1

2021-09-05 Thread Hongtao Liu via Gcc-patches
On Sun, Sep 5, 2021 at 5:54 AM H.J. Lu via Gcc-patches wrote: > > Add non-destructive source alternative to @xorsign3_1 for AVX. LGTM. > > gcc/ > > PR target/89984 > * config/i386/i386-expand.c (ix86_split_xorsign): Use operands[2]. > * config/i386/i386.md (@xorsign3_1): Ad

[PATCH] x86: Add non-destructive source to @xorsign3_1

2021-09-04 Thread H.J. Lu via Gcc-patches
Add non-destructive source alternative to @xorsign3_1 for AVX. gcc/ PR target/89984 * config/i386/i386-expand.c (ix86_split_xorsign): Use operands[2]. * config/i386/i386.md (@xorsign3_1): Add non-destructive source alternative for AVX. gcc/testsuite/ PR t

[PATCH] x86: Add testcases for PR target/80566

2021-08-02 Thread H.J. Lu via Gcc-patches
PR target/80566 * g++.target/i386/pr80566-1.C: New test. * g++.target/i386/pr80566-2.C: Likewise. --- gcc/testsuite/g++.target/i386/pr80566-1.C | 15 +++ gcc/testsuite/g++.target/i386/pr80566-2.C | 14 ++ 2 files changed, 29 insertions(+) create mod

[PATCH] x86: Add a test for PR tree-optimization/42587

2021-05-08 Thread H.J. Lu via Gcc-patches
PR tree-optimization/42587 * gcc.target/i386/pr42587.c: New test. --- gcc/testsuite/gcc.target/i386/pr42587.c | 35 + 1 file changed, 35 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr42587.c diff --git a/gcc/testsuite/gcc.target/i386/pr4

Re: [GCC 12] [PATCH] x86: Add -mmwait for -mgeneral-regs-only

2021-04-21 Thread Uros Bizjak via Gcc-patches
On Tue, Apr 20, 2021 at 2:12 PM H.J. Lu wrote: > > Add -mmwait so that the MWAIT and MONITOR intrinsics can be used with > -mgeneral-regs-only and make -msse3 to imply -mmwait. > > gcc/ > > * config.gcc: Install mwaitintrin.h for i[34567]86-*-* and > x86_64-*-* targets. > *

[PATCH] x86: Add general_regs_only function attribute

2021-04-20 Thread H.J. Lu via Gcc-patches
commit 87c753ac241f25d222d46ba1ac66ceba89d6a200 Author: H.J. Lu Date: Fri Aug 21 09:42:49 2020 -0700 x86: Add target("general-regs-only") function attribute is incomplete since it is impossible to call integer intrinsics from a function with general-regs-only target attribute. 1. Add gene

[GCC 12] [PATCH] x86: Add -mmwait for -mgeneral-regs-only

2021-04-20 Thread H.J. Lu via Gcc-patches
Add -mmwait so that the MWAIT and MONITOR intrinsics can be used with -mgeneral-regs-only and make -msse3 to imply -mmwait. gcc/ * config.gcc: Install mwaitintrin.h for i[34567]86-*-* and x86_64-*-* targets. * common/config/i386/i386-common.c (OPTION_MASK_ISA2_MWAIT_SET):

Re: [PATCH] x86: Add __volatile__ to __cpuid and __cpuid_count

2021-03-23 Thread Uros Bizjak via Gcc-patches
On Mon, Mar 22, 2021 at 5:19 AM H.J. Lu wrote: > > Tested on Linux/x86-64 and Linux/i686. OK for master and release > branches? > > Thanks. > > H.J. > --- > Since CPUID instruction may return different values on hybrid core. > volatile is needed on asm statements in . > > PR target/99704

[PATCH] x86: Add __volatile__ to __cpuid and __cpuid_count

2021-03-21 Thread H.J. Lu via Gcc-patches
Tested on Linux/x86-64 and Linux/i686. OK for master and release branches? Thanks. H.J. --- Since CPUID instruction may return different values on hybrid core. volatile is needed on asm statements in . PR target/99704 * config/i386/cpuid.h (__cpuid): Add __volatile__. (_

[PATCH] x86: Add the missing '.' for -mneeded

2020-12-02 Thread H.J. Lu via Gcc-patches
Emit GNU_PROPERTY_X86_ISA_1_NEEDED GNU > property" > FAIL: gcc.dg/lto/save-temps c_lto_save-temps_0.o-c_lto_save-temps_0.o link, > -O -flto -save-temps > I am checking this as an obvious fix. -- H.J. From c3dff3d73da87f20effae8defaf926e8ba5204db Mon Sep 17 00:

Re: [PATCH] x86: Add -mneeded for GNU_PROPERTY_X86_ISA_1_V[234] marker

2020-12-01 Thread Jeff Law via Gcc-patches
On 11/16/20 6:44 PM, H.J. Lu wrote: > On Mon, Nov 16, 2020 at 4:58 PM Jeff Law wrote: >> >> On 11/9/20 11:57 AM, H.J. Lu via Gcc-patches wrote: >>> GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA >>> levels: >>> >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250 >>

[PATCH] x86: Add a testcase for PR target/31799

2020-11-17 Thread H.J. Lu via Gcc-patches
Add a testcase for PR target/31799 which was fixed by commit 4f0473fe89e68bf7c09542ee5c3684da25a5b435 Author: Uros Bizjak Date: Fri May 12 21:04:05 2017 +0200 compare-elim.c (try_eliminate_compare): Canonicalize operation with embedded compare to [(set (reg:CCM) (compare:CCM...

Re: [PATCH] x86: Add -mneeded for GNU_PROPERTY_X86_ISA_1_V[234] marker

2020-11-16 Thread H.J. Lu via Gcc-patches
On Mon, Nov 16, 2020 at 4:58 PM Jeff Law wrote: > > > On 11/9/20 11:57 AM, H.J. Lu via Gcc-patches wrote: > > GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA > > levels: > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250 > > > > Binutils has been updated to suppor

Re: [PATCH] x86: Add -mneeded for GNU_PROPERTY_X86_ISA_1_V[234] marker

2020-11-16 Thread Jeff Law via Gcc-patches
On 11/9/20 11:57 AM, H.J. Lu via Gcc-patches wrote: > GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA > levels: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250 > > Binutils has been updated to support GNU_PROPERTY_X86_ISA_1_V[234] marker: > > https://gitlab.com/x8

[PATCH] x86: Add -mneeded for GNU_PROPERTY_X86_ISA_1_V[234] marker

2020-11-09 Thread H.J. Lu via Gcc-patches
GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA levels: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250 Binutils has been updated to support GNU_PROPERTY_X86_ISA_1_V[234] marker: https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/13 with commit b0ab06937385e

Re: PING^3 [PATCH] x86: Add cmpmemsi for -minline-all-stringops

2020-10-23 Thread H.J. Lu via Gcc-patches
On Sun, Oct 18, 2020 at 8:16 AM Jan Hubicka wrote: > > > On Fri, Oct 2, 2020 at 6:21 AM H.J. Lu wrote: > > > > > > On Wed, Sep 16, 2020 at 10:07 PM H.J. Lu wrote: > > > > > > > > On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu wrote: > > > > > > > > > > On Tue, May 19, 2020 at 5:14 AM H.J. Lu wrote: >

Re: PING^3 [PATCH] x86: Add cmpmemsi for -minline-all-stringops

2020-10-18 Thread H.J. Lu via Gcc-patches
On Sun, Oct 18, 2020 at 8:16 AM Jan Hubicka wrote: > > > On Fri, Oct 2, 2020 at 6:21 AM H.J. Lu wrote: > > > > > > On Wed, Sep 16, 2020 at 10:07 PM H.J. Lu wrote: > > > > > > > > On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu wrote: > > > > > > > > > > On Tue, May 19, 2020 at 5:14 AM H.J. Lu wrote: >

Re: PING^3 [PATCH] x86: Add cmpmemsi for -minline-all-stringops

2020-10-18 Thread Jan Hubicka
> On Fri, Oct 2, 2020 at 6:21 AM H.J. Lu wrote: > > > > On Wed, Sep 16, 2020 at 10:07 PM H.J. Lu wrote: > > > > > > On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu wrote: > > > > > > > > On Tue, May 19, 2020 at 5:14 AM H.J. Lu wrote: > > > > > > > > > > On Tue, May 19, 2020 at 1:48 AM Uros Bizjak wrot

Re: PING^3 [PATCH] x86: Add cmpmemsi for -minline-all-stringops

2020-10-17 Thread H.J. Lu via Gcc-patches
On Fri, Oct 2, 2020 at 6:21 AM H.J. Lu wrote: > > On Wed, Sep 16, 2020 at 10:07 PM H.J. Lu wrote: > > > > On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu wrote: > > > > > > On Tue, May 19, 2020 at 5:14 AM H.J. Lu wrote: > > > > > > > > On Tue, May 19, 2020 at 1:48 AM Uros Bizjak wrote: > > > > > > > >

Re: [PATCH] x86: Add missing intrinsics [PR95483]

2020-10-14 Thread Uros Bizjak via Gcc-patches
> gcc/ChangeLog: > > * config/i386/avx2intrin.h (_mm_broadcastsi128_si256): New intrinsics. > (_mm_broadcastsd_pd): Ditto. > * config/i386/avx512bwintrin.h (_mm512_loadu_epi16): New intrinsics. > (_mm512_storeu_epi16): Ditto. > (_mm512_loadu_epi8): Ditto. > (_mm512_storeu_epi8): Ditto. > * config/i

[PATCH] x86: Add missing intrinsics [PR95483]

2020-10-13 Thread Sunil K Pandey via Gcc-patches
Tested on x86-64. gcc/ChangeLog: * config/i386/avx2intrin.h (_mm_broadcastsi128_si256): New intrinsics. (_mm_broadcastsd_pd): Ditto. * config/i386/avx512bwintrin.h (_mm512_loadu_epi16): New intrinsics. (_mm512_storeu_epi16): Ditto. (_mm512_loadu_epi8): Ditt

PING^2 [PATCH] x86: Add

2020-10-08 Thread H.J. Lu via Gcc-patches
On Fri, Oct 2, 2020 at 5:51 AM H.J. Lu wrote: > > On Wed, Sep 23, 2020 at 10:58 AM H.J. Lu wrote: > > > > For sources which can't use any vector instructions, and > > cannot be included for compiler intrinsics: > > > > $ echo "#include " | gcc -S -O2 -mno-sse -mno-mmx -x c - > > In file include

  1   2   3   >