On Mon, Apr 21, 2025 at 4:30 PM H.J. Lu wrote:
>
> On Mon, Apr 21, 2025 at 11:29 AM Hongtao Liu wrote:
> >
> > On Sat, Apr 19, 2025 at 1:25 PM H.J. Lu wrote:
> > >
> > > On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote:
> > > >
> > > > For all different modes of all 0s/1s vectors, we can use the si
On Mon, Apr 21, 2025 at 11:29 AM Hongtao Liu wrote:
>
> On Sat, Apr 19, 2025 at 1:25 PM H.J. Lu wrote:
> >
> > On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote:
> > >
> > > For all different modes of all 0s/1s vectors, we can use the single widest
> > > all 0s/1s vector register for all 0s/1s vector
On Sat, Apr 19, 2025 at 1:25 PM H.J. Lu wrote:
>
> On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote:
> >
> > For all different modes of all 0s/1s vectors, we can use the single widest
> > all 0s/1s vector register for all 0s/1s vector uses in the whole function.
> > Add a pass to generate a single wi
On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote:
>
> For all different modes of all 0s/1s vectors, we can use the single widest
> all 0s/1s vector register for all 0s/1s vector uses in the whole function.
> Add a pass to generate a single widest all 0s/1s vector set instruction at
> entry of the near
Add preserve_none attribute which is similar to no_callee_saved_registers
attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are
used for integer parameter passing. This can be used in an interpreter
to avoid saving/restoring the registers in functions which processing
byte cod
J. Lu"
Date: Mon, 24 Feb 2025 05:44:40 +0800
Subject: [PATCH] x86: Add tests for PR tree-optimization/82142
Verify that PR tree-optimization/82142 testcase is properly optimized.
PR tree-optimization/82142
* gcc.target/i386/pr82142a.c: New file.
* gcc.target/i386/pr82142b.c: Likewise.
Sig
On Sat, Feb 1, 2025 at 11:14 AM H.J. Lu wrote:
>
> Verify that -mstack-protector-guard=global works on x86. Default stack
> protector uses TLS. -mstack-protector-guard=global uses a global variable,
> __stack_chk_guard, instead of TLS.
>
> * gcc.target/i386/ssp-global.c: New file.
OK.
Thanks,
:00:00 2001
From: "H.J. Lu"
Date: Sat, 1 Feb 2025 18:06:33 +0800
Subject: [PATCH] x86: Add a -mstack-protector-guard=global test
Verify that -mstack-protector-guard=global works on x86. Default stack
protector uses TLS. -mstack-protector-guard=global uses a global variable,
__stack
On Mon, Dec 2, 2024, 11:16 AM Hongtao Liu wrote:
> On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote:
> >
> > For all different modes of all 0s/1s vectors, we can use the single
> widest
> > all 0s/1s vector register for all 0s/1s vector uses in the whole
> function.
> > Add a pass to generate a sing
On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote:
>
> For all different modes of all 0s/1s vectors, we can use the single widest
> all 0s/1s vector register for all 0s/1s vector uses in the whole function.
> Add a pass to generate a single widest all 0s/1s vector set instruction at
> entry of the near
On Sun, Dec 1, 2024 at 8:01 PM Uros Bizjak wrote:
>
> On Sat, Nov 30, 2024 at 11:00 PM H.J. Lu wrote:
> >
> > Add pcmpeq splitters to split
> >
> > (insn 5 3 7 2 (set (reg:V4SI 100)
> > (eq:V4SI (reg:V4SI 98)
> > (reg:V4SI 98))) 7910 {*sse2_eqv4si3}
> > (expr_list:REG_DEA
On Sat, Nov 30, 2024 at 11:00 PM H.J. Lu wrote:
>
> Add pcmpeq splitters to split
>
> (insn 5 3 7 2 (set (reg:V4SI 100)
> (eq:V4SI (reg:V4SI 98)
> (reg:V4SI 98))) 7910 {*sse2_eqv4si3}
> (expr_list:REG_DEAD (reg:V4SI 98)
> (expr_list:REG_EQUAL (eq:V4SI (const_vector
For all different modes of all 0s/1s vectors, we can use the single widest
all 0s/1s vector register for all 0s/1s vector uses in the whole function.
Add a pass to generate a single widest all 0s/1s vector set instruction at
entry of the nearest common dominator for basic blocks with all 0s/1s
vect
Add pcmpeq splitters to split
(insn 5 3 7 2 (set (reg:V4SI 100)
(eq:V4SI (reg:V4SI 98)
(reg:V4SI 98))) 7910 {*sse2_eqv4si3}
(expr_list:REG_DEAD (reg:V4SI 98)
(expr_list:REG_EQUAL (eq:V4SI (const_vector:V4SI [
(const_int -1 [0xfff
On Mon, Jun 3, 2024 at 5:11 AM liuhongt wrote:
>
> W/o TARGET_SSE4_1, it takes 3 instructions (pand, pandn and por) for
> movdfcc/movsfcc, and could possibly fail cost comparison. Increase
> branch cost could hurt performance for other modes, so specially add
> some preference for floating point i
W/o TARGET_SSE4_1, it takes 3 instructions (pand, pandn and por) for
movdfcc/movsfcc, and could possibly fail cost comparison. Increase
branch cost could hurt performance for other modes, so specially add
some preference for floating point ifcvt.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-
Hi All
We've introduced a new subroutine in ix86_expand_vec_perm_const_1
to optimize vector shifting for the V16QI type on x86.
This patch uses a three-instruction sequence psrlw, psllw, and por
to handle specific vector shuffle operations more efficiently.
The change aims to improve assembly code
On Thu, May 9, 2024 at 11:12 AM Levy Hsu wrote:
>
> Hi All
>
> We've introduced a new subroutine in ix86_expand_vec_perm_const_1
> to optimize vector shifting for the V16QI type on x86.
> This patch uses a three-instruction sequence psrlw, psllw, and por
> to handle specific vector shuffle operati
Hi All
We've introduced a new subroutine in ix86_expand_vec_perm_const_1
to optimize vector shifting for the V16QI type on x86.
This patch uses a three-instruction sequence psrlw, psllw, and por
to handle specific vector shuffle operations more efficiently.
The change aims to improve assembly code
On Wed, May 8, 2024 at 4:44 AM Levy Hsu wrote:
>
> PR target/107563
>
> gcc/ChangeLog:
>
> * config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New
> subroutine.
> (ix86_expand_vec_perm_const_1): New Entry.
>
> gcc/testsuite/ChangeLog:
>
> * g++.t
Hi All
We've introduced a new subroutine in ix86_expand_vec_perm_const_1
to optimize vector shifting for the V16QI type on x86.
This patch uses a three-instruction sequence psrlw, psllw, and por
to handle specific vector shuffle operations more efficiently.
The change aims to improve assembly c
PR target/107563
gcc/ChangeLog:
* config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New
subroutine.
(ix86_expand_vec_perm_const_1): New Entry.
gcc/testsuite/ChangeLog:
* g++.target/i386/pr107563.C: New test.
---
gcc/config/i386/i386-expand.cc
Prevent rtl optimization of vec_duplicate + zero_extend to
vpbroadcastm since there could be an extra kmov after RA.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
Ready to push to trunk.
gcc/ChangeLog:
PR target/110788
* config/i386/sse.md (avx512cd_maskb_vec_dup): Add
This is to cover testing also being done with -march=cascadelake.
---
Committing as obvious.
--- a/gcc/testsuite/gcc.target/i386/avx512f-dupv2di.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-dupv2di.c
@@ -1,5 +1,5 @@
/* { dg-do compile { target { ! ia32 } } } */
-/* { dg-options "-mavx512f -mno-a
On Wed, Jun 14, 2023 at 1:56 PM Jan Beulich via Gcc-patches
wrote:
>
> gcc/
>
> * config/i386/constraints.md: Mention k and r for B.
Ok.
>
> --- a/gcc/config/i386/constraints.md
> +++ b/gcc/config/i386/constraints.md
> @@ -162,7 +162,9 @@
> ;; g GOT memory operand.
> ;; m Vector memo
gcc/
* config/i386/constraints.md: Mention k and r for B.
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -162,7 +162,9 @@
;; g GOT memory operand.
;; m Vector memory operand
;; c Constant memory operand
+;; k TLS address that allows insn using non-
On Mon, Jun 5, 2023 at 9:26 AM liuhongt wrote:
>
> This patch only support vec_pack/unpacks optabs for vector modes whose lenth
> >= 128.
> For 32/64-bit vector, they're more hanlded by BB vectorizer with
> truncmn2/extendmn2/fix{,uns}_truncmn2.
>
> Bootstrapped and regtested on x86_64-pc-linux-g
This patch only support vec_pack/unpacks optabs for vector modes whose lenth >=
128.
For 32/64-bit vector, they're more hanlded by BB vectorizer with
truncmn2/extendmn2/fix{,uns}_truncmn2.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready to push to trunk.
gcc/ChangeLog:
*
On Fri, May 12, 2023 at 1:43 PM Hongtao Liu wrote:
>
> On Wed, May 10, 2023 at 5:10 PM liuhongt wrote:
> >
> > > The quoted patch shows -shared in context and you didn't post a
> > > backport version
> > > to look at. But yes, we shouldn't change -shared behavior on a
> > > branch, even less so
On Wed, May 10, 2023 at 5:10 PM liuhongt wrote:
>
> > The quoted patch shows -shared in context and you didn't post a
> > backport version
> > to look at. But yes, we shouldn't change -shared behavior on a
> > branch, even less so make it
> > inconsistent between targets.
> Here's the patch.
>
>
> The quoted patch shows -shared in context and you didn't post a
> backport version
> to look at. But yes, we shouldn't change -shared behavior on a
> branch, even less so make it
> inconsistent between targets.
Here's the patch.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for
> -Original Message-
> From: Jakub Jelinek
> Sent: Tuesday, October 11, 2022 9:59 PM
> To: Liu, Hongtao
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] [x86] Add define_insn_and_split to support general
> version of "kxnor".
>
> On Tue, Oct 1
On Tue, Oct 11, 2022 at 04:03:16PM +0800, liuhongt via Gcc-patches wrote:
> gcc/ChangeLog:
>
> * config/i386/i386.md (*notxor_1): New post_reload
> define_insn_and_split.
> (*notxorqi_1): Ditto.
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -10826,6 +10826
On Tue, Oct 11, 2022 at 10:03 AM liuhongt wrote:
>
> For genereal_reg_operand, it will be splitted into xor + not.
> For mask_reg_operand, it will be splitted with UNSPEC_MASK_OP just
> like what we did for other logic operations.
>
> The patch will optimize xor+not to kxnor when possible.
>
> Boo
For genereal_reg_operand, it will be splitted into xor + not.
For mask_reg_operand, it will be splitted with UNSPEC_MASK_OP just
like what we did for other logic operations.
The patch will optimize xor+not to kxnor when possible.
Bootstrapped and regtested on x86_64-pc-linux-gnu.
Ok for trunk?
g
On Wed, Jul 27, 2022 at 4:47 PM H.J. Lu wrote:
>
> On Thu, Jul 21, 2022 at 11:53 AM H.J. Lu wrote:
> >
> > We can't always use the PLT entry as the function address for local IFUNC
> > functions. When the PIC register is needed for PLT call, indirect call
> > via the PLT entry will fail since th
On Thu, Jul 21, 2022 at 11:53 AM H.J. Lu wrote:
>
> We can't always use the PLT entry as the function address for local IFUNC
> functions. When the PIC register is needed for PLT call, indirect call
> via the PLT entry will fail since the PIC register may not be set up
> properly for indirect cal
We can't always use the PLT entry as the function address for local IFUNC
functions. When the PIC register is needed for PLT call, indirect call
via the PLT entry will fail since the PIC register may not be set up
properly for indirect call. Add ix86_ifunc_ref_local_ok to return false
when the PL
On Mon, May 9, 2022 at 7:51 AM H.J. Lu wrote:
>
> Add .note.GNU-stack section only for Linux since it may not be supported
> on non-Linux OSes. __ELF__ isn't checked since these tests can only run
> on Linux/x86 ELF systems.
>
> PR target/105472
> * gcc.target/i386/iamcu/asm-suppo
Add .note.GNU-stack section only for Linux since it may not be supported
on non-Linux OSes. __ELF__ isn't checked since these tests can only run
on Linux/x86 ELF systems.
PR target/105472
* gcc.target/i386/iamcu/asm-support.S: Add .note.GNU-stack section
only for Linux.
Hi H.J,
> On Mon, May 2, 2022 at 11:37 AM H.J. Lu wrote:
>>
>> On Fri, Apr 29, 2022 at 10:38 AM H.J. Lu wrote:
>> >
>> > Add .note.GNU-stack assembly source to avoid linker warning:
>> >
>> > ld: warning: /tmp/ccPZSZ7Z.o: missing .note.GNU-stack section implies
>> > executable stack
>> > ld: NO
On Mon, May 2, 2022 at 11:37 AM H.J. Lu wrote:
>
> On Fri, Apr 29, 2022 at 10:38 AM H.J. Lu wrote:
> >
> > Add .note.GNU-stack assembly source to avoid linker warning:
> >
> > ld: warning: /tmp/ccPZSZ7Z.o: missing .note.GNU-stack section implies
> > executable stack
> > ld: NOTE: This behaviour
On Fri, Apr 29, 2022 at 10:38 AM H.J. Lu wrote:
>
> Add .note.GNU-stack assembly source to avoid linker warning:
>
> ld: warning: /tmp/ccPZSZ7Z.o: missing .note.GNU-stack section implies
> executable stack
> ld: NOTE: This behaviour is deprecated and will be removed in a future
> version of the
Add .note.GNU-stack assembly source to avoid linker warning:
ld: warning: /tmp/ccPZSZ7Z.o: missing .note.GNU-stack section implies
executable stack
ld: NOTE: This behaviour is deprecated and will be removed in a future version
of the linker
FAIL: gcc.target/i386/iamcu/test_3_element_struct_and_u
On Thu, Feb 17, 2022 at 10:49:48AM +0100, Richard Biener via Gcc-patches wrote:
> On Thu, Feb 17, 2022 at 8:52 AM Uros Bizjak via Gcc-patches
> wrote:
> >
> > On Thu, Feb 17, 2022 at 6:25 AM Hongtao Liu via Gcc-patches
> > wrote:
> > >
> > > On Thu, Feb 17, 2022 at 12:26 PM H.J. Lu via Gcc-patche
On Thu, Feb 17, 2022 at 8:52 AM Uros Bizjak via Gcc-patches
wrote:
>
> On Thu, Feb 17, 2022 at 6:25 AM Hongtao Liu via Gcc-patches
> wrote:
> >
> > On Thu, Feb 17, 2022 at 12:26 PM H.J. Lu via Gcc-patches
> > wrote:
> > >
> > > Reading YMM registers with all zero bits needs VZEROUPPER on Sandy B
On Thu, Feb 17, 2022 at 6:25 AM Hongtao Liu via Gcc-patches
wrote:
>
> On Thu, Feb 17, 2022 at 12:26 PM H.J. Lu via Gcc-patches
> wrote:
> >
> > Reading YMM registers with all zero bits needs VZEROUPPER on Sandy Bride,
> > Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid SSE <-> AVX
> > tra
On Thu, Feb 17, 2022 at 12:26 PM H.J. Lu via Gcc-patches
wrote:
>
> Reading YMM registers with all zero bits needs VZEROUPPER on Sandy Bride,
> Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid SSE <-> AVX
> transition penalty. Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER to
> generate vzero
Reading YMM registers with all zero bits needs VZEROUPPER on Sandy Bride,
Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid SSE <-> AVX
transition penalty. Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER to
generate vzeroupper instruction after loading all-zero YMM/YMM registers
and enable it by
On Fri, Dec 3, 2021 at 8:55 AM Uros Bizjak wrote:
>
> On Fri, Dec 3, 2021 at 2:24 PM H.J. Lu wrote:
> >
> > On Thu, Nov 25, 2021 at 2:47 PM H.J. Lu wrote:
> > >
> > > Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move
> > > and store, independent of -mprefer-vector-width=bit
On Fri, Dec 3, 2021 at 2:24 PM H.J. Lu wrote:
>
> On Thu, Nov 25, 2021 at 2:47 PM H.J. Lu wrote:
> >
> > Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move
> > and store, independent of -mprefer-vector-width=bits:
> >
> > 1. Add X86_TUNE_AVX512_MOVE_BY_PIECES and X86_TUNE_AVX
On Thu, Nov 25, 2021 at 2:47 PM H.J. Lu wrote:
>
> Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move
> and store, independent of -mprefer-vector-width=bits:
>
> 1. Add X86_TUNE_AVX512_MOVE_BY_PIECES and X86_TUNE_AVX512_STORE_BY_PIECES
> which are enabled for Intel Sapphire Ra
Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move
and store, independent of -mprefer-vector-width=bits:
1. Add X86_TUNE_AVX512_MOVE_BY_PIECES and X86_TUNE_AVX512_STORE_BY_PIECES
which are enabled for Intel Sapphire Rapids processor.
2. Add -mmove-max=bits to set the maximum n
On Wed, Nov 17, 2021 at 6:08 AM Uros Bizjak wrote:
>
> On Wed, Nov 17, 2021 at 2:46 PM H.J. Lu wrote:
> >
> > On Wed, Nov 17, 2021 at 1:05 AM Uros Bizjak wrote:
> > >
> > > On Tue, Nov 16, 2021 at 7:20 PM H.J. Lu via Gcc-patches
> > > wrote:
> > > >
> > > > Add -mharden-sls= to mitigate against
On Wed, Nov 17, 2021 at 2:46 PM H.J. Lu wrote:
>
> On Wed, Nov 17, 2021 at 1:05 AM Uros Bizjak wrote:
> >
> > On Tue, Nov 16, 2021 at 7:20 PM H.J. Lu via Gcc-patches
> > wrote:
> > >
> > > Add -mharden-sls= to mitigate against straight line speculation (SLS)
> > > for function return and indirec
On Wed, Nov 17, 2021 at 1:10 AM Uros Bizjak wrote:
>
> On Tue, Nov 16, 2021 at 7:51 PM H.J. Lu via Gcc-patches
> wrote:
> >
> > Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to thunk
> > via r8-r15 registers when converting indirect call and jump to increase
> > the instruction
On Wed, Nov 17, 2021 at 1:05 AM Uros Bizjak wrote:
>
> On Tue, Nov 16, 2021 at 7:20 PM H.J. Lu via Gcc-patches
> wrote:
> >
> > Add -mharden-sls= to mitigate against straight line speculation (SLS)
> > for function return and indirect branch by adding an INT3 instruction
> > after function return
On Tue, Nov 16, 2021 at 7:51 PM H.J. Lu via Gcc-patches
wrote:
>
> Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to thunk
> via r8-r15 registers when converting indirect call and jump to increase
> the instruction length to 6, allowing the non-thunk form to be inlined.
>
> gcc/
On Tue, Nov 16, 2021 at 7:20 PM H.J. Lu via Gcc-patches
wrote:
>
> Add -mharden-sls= to mitigate against straight line speculation (SLS)
> for function return and indirect branch by adding an INT3 instruction
> after function return and indirect branch.
>
> gcc/
>
> PR target/102952
>
Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to thunk
via r8-r15 registers when converting indirect call and jump to increase
the instruction length to 6, allowing the non-thunk form to be inlined.
gcc/
PR target/102952
* config/i386/i386.c (ix86_output_jmp_thu
Add -mharden-sls= to mitigate against straight line speculation (SLS)
for function return and indirect branch by adding an INT3 instruction
after function return and indirect branch.
gcc/
PR target/102952
* config/i386/i386-opts.h (harden_sls): New enum.
* config/i386/i386
On Mon, Nov 15, 2021 at 05:40:01AM -0800, H.J. Lu via Gcc-patches wrote:
> PR target/103205
> * gcc.target/i386/pr103205-2.c: New test.
Ok, thanks.
Jakub
PR target/103205
* gcc.target/i386/pr103205-2.c: New test.
---
gcc/testsuite/gcc.target/i386/pr103205-2.c | 46 ++
1 file changed, 46 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/i386/pr103205-2.c
diff --git a/gcc/testsuite/gcc.target/i386/pr10320
On Thu, Oct 21, 2021 at 12:15 AM Richard Biener
wrote:
>
> On Wed, Oct 20, 2021 at 8:34 PM H.J. Lu wrote:
> >
> > On Wed, Oct 20, 2021 at 9:58 AM Richard Biener
> > wrote:
> > >
> > > On October 20, 2021 3:19:28 PM GMT+02:00, "H.J. Lu"
> > > wrote:
> > > >On Wed, Oct 20, 2021 at 4:18 AM Richar
On Wed, Oct 20, 2021 at 8:34 PM H.J. Lu wrote:
>
> On Wed, Oct 20, 2021 at 9:58 AM Richard Biener
> wrote:
> >
> > On October 20, 2021 3:19:28 PM GMT+02:00, "H.J. Lu"
> > wrote:
> > >On Wed, Oct 20, 2021 at 4:18 AM Richard Biener
> > > wrote:
> > >>
> > >> On Wed, Oct 20, 2021 at 12:40 PM Xu Di
On Wed, Oct 20, 2021 at 9:58 AM Richard Biener
wrote:
>
> On October 20, 2021 3:19:28 PM GMT+02:00, "H.J. Lu"
> wrote:
> >On Wed, Oct 20, 2021 at 4:18 AM Richard Biener
> > wrote:
> >>
> >> On Wed, Oct 20, 2021 at 12:40 PM Xu Dianhong wrote:
> >> >
> >> > Many thanks for your explanation. I got
On October 20, 2021 3:19:28 PM GMT+02:00, "H.J. Lu" wrote:
>On Wed, Oct 20, 2021 at 4:18 AM Richard Biener
> wrote:
>>
>> On Wed, Oct 20, 2021 at 12:40 PM Xu Dianhong wrote:
>> >
>> > Many thanks for your explanation. I got the meaning of operands.
>> > The "addpd b(%rip), %xmm0" instruction need
On Wed, Oct 20, 2021 at 4:18 AM Richard Biener
wrote:
>
> On Wed, Oct 20, 2021 at 12:40 PM Xu Dianhong wrote:
> >
> > Many thanks for your explanation. I got the meaning of operands.
> > The "addpd b(%rip), %xmm0" instruction needs "b(%rip)" aligned otherwise it
> > will rise a "Real-Address Mod
On Wed, Oct 20, 2021 at 12:40 PM Xu Dianhong wrote:
>
> Many thanks for your explanation. I got the meaning of operands.
> The "addpd b(%rip), %xmm0" instruction needs "b(%rip)" aligned otherwise it
> will rise a "Real-Address Mode Exceptions".
> I haven't considered this situation "b(%rip)" has
Many thanks for your explanation. I got the meaning of operands.
The "addpd b(%rip), %xmm0" instruction needs "b(%rip)" aligned otherwise
it will rise a "Real-Address Mode Exceptions".
I haven't considered this situation "b(%rip)" has an address dependence of
"a(%rip)" before. I think this situati
On Wed, Oct 20, 2021 at 9:48 AM Xu Dianhong wrote:
>
> Thanks for the comments.
>
> > And does it even work?
> It works, I checked it in the test case, and when using this option, it can
> emit an unaligned vector move.
> >I fail to see adjustments to memory operands of
> SSE/AVX instructions tha
Thanks for the comments.
>Why would you ever want to have such option?!
I need to ask @H. J. Lu for help to answer this question. He knows more
about the background. I may not explain it clearly.
>Should the documentation
at least read "emit unaligned vector moves even for aligned storage or when
Thanks for the comments.
> And does it even work?
It works, I checked it in the test case, and when using this option, it can
emit an unaligned vector move.
>I fail to see adjustments to memory operands of
SSE/AVX instructions that have to be aligned
I changed all vector move in "get_ssemov" witho
On Wed, Oct 20, 2021 at 9:02 AM Richard Biener
wrote:
>
> On Wed, Oct 20, 2021 at 7:31 AM dianhong.xu--- via Gcc-patches
> wrote:
> >
> > From: dianhong xu
> >
> > Add -muse-unaligned-vector-move option to emit unaligned vector move
> > instaructions.
>
> Why would you ever want to have such opt
On Wed, Oct 20, 2021 at 7:31 AM dianhong.xu--- via Gcc-patches
wrote:
>
> From: dianhong xu
>
> Add -muse-unaligned-vector-move option to emit unaligned vector move
> instaructions.
Why would you ever want to have such option?! Should the documentation
at least read "emit unaligned vector moves
From: dianhong xu
Add -muse-unaligned-vector-move option to emit unaligned vector move
instaructions.
gcc/ChangeLog:
* config/i386/i386-options.c (ix86_target_string): Add
-muse-unaligned-vector-move.
* config/i386/i386.c (ix86_get_ssemov): Emit unaligned vector if use
On Thu, Sep 9, 2021 at 11:21 AM H.J. Lu via Gcc-patches
wrote:
>
> 1. Add TARGET_AVX256_MOVE_BY_PIECES to perform move by-pieces operation
> with 256-bit AVX instructions.
> 2. Add TARGET_AVX256_STORE_BY_PIECES to perform move and store by-pieces
> operations with 256-bit AVX instructions.
>
> The
1. Add TARGET_AVX256_MOVE_BY_PIECES to perform move by-pieces operation
with 256-bit AVX instructions.
2. Add TARGET_AVX256_STORE_BY_PIECES to perform move and store by-pieces
operations with 256-bit AVX instructions.
They are enabled only for Intel Alder Lake and Intel processors with
AVX512.
gc
On Sun, Sep 5, 2021 at 5:54 AM H.J. Lu via Gcc-patches
wrote:
>
> Add non-destructive source alternative to @xorsign3_1 for AVX.
LGTM.
>
> gcc/
>
> PR target/89984
> * config/i386/i386-expand.c (ix86_split_xorsign): Use operands[2].
> * config/i386/i386.md (@xorsign3_1): Ad
Add non-destructive source alternative to @xorsign3_1 for AVX.
gcc/
PR target/89984
* config/i386/i386-expand.c (ix86_split_xorsign): Use operands[2].
* config/i386/i386.md (@xorsign3_1): Add non-destructive
source alternative for AVX.
gcc/testsuite/
PR t
PR target/80566
* g++.target/i386/pr80566-1.C: New test.
* g++.target/i386/pr80566-2.C: Likewise.
---
gcc/testsuite/g++.target/i386/pr80566-1.C | 15 +++
gcc/testsuite/g++.target/i386/pr80566-2.C | 14 ++
2 files changed, 29 insertions(+)
create mod
PR tree-optimization/42587
* gcc.target/i386/pr42587.c: New test.
---
gcc/testsuite/gcc.target/i386/pr42587.c | 35 +
1 file changed, 35 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/i386/pr42587.c
diff --git a/gcc/testsuite/gcc.target/i386/pr4
On Tue, Apr 20, 2021 at 2:12 PM H.J. Lu wrote:
>
> Add -mmwait so that the MWAIT and MONITOR intrinsics can be used with
> -mgeneral-regs-only and make -msse3 to imply -mmwait.
>
> gcc/
>
> * config.gcc: Install mwaitintrin.h for i[34567]86-*-* and
> x86_64-*-* targets.
> *
commit 87c753ac241f25d222d46ba1ac66ceba89d6a200
Author: H.J. Lu
Date: Fri Aug 21 09:42:49 2020 -0700
x86: Add target("general-regs-only") function attribute
is incomplete since it is impossible to call integer intrinsics from
a function with general-regs-only target attribute.
1. Add gene
Add -mmwait so that the MWAIT and MONITOR intrinsics can be used with
-mgeneral-regs-only and make -msse3 to imply -mmwait.
gcc/
* config.gcc: Install mwaitintrin.h for i[34567]86-*-* and
x86_64-*-* targets.
* common/config/i386/i386-common.c (OPTION_MASK_ISA2_MWAIT_SET):
On Mon, Mar 22, 2021 at 5:19 AM H.J. Lu wrote:
>
> Tested on Linux/x86-64 and Linux/i686. OK for master and release
> branches?
>
> Thanks.
>
> H.J.
> ---
> Since CPUID instruction may return different values on hybrid core.
> volatile is needed on asm statements in .
>
> PR target/99704
Tested on Linux/x86-64 and Linux/i686. OK for master and release
branches?
Thanks.
H.J.
---
Since CPUID instruction may return different values on hybrid core.
volatile is needed on asm statements in .
PR target/99704
* config/i386/cpuid.h (__cpuid): Add __volatile__.
(_
Emit GNU_PROPERTY_X86_ISA_1_NEEDED GNU
> property"
> FAIL: gcc.dg/lto/save-temps c_lto_save-temps_0.o-c_lto_save-temps_0.o link,
> -O -flto -save-temps
>
I am checking this as an obvious fix.
--
H.J.
From c3dff3d73da87f20effae8defaf926e8ba5204db Mon Sep 17 00:
On 11/16/20 6:44 PM, H.J. Lu wrote:
> On Mon, Nov 16, 2020 at 4:58 PM Jeff Law wrote:
>>
>> On 11/9/20 11:57 AM, H.J. Lu via Gcc-patches wrote:
>>> GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA
>>> levels:
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250
>>
Add a testcase for PR target/31799 which was fixed by
commit 4f0473fe89e68bf7c09542ee5c3684da25a5b435
Author: Uros Bizjak
Date: Fri May 12 21:04:05 2017 +0200
compare-elim.c (try_eliminate_compare): Canonicalize operation with
embedded compare to [(set (reg:CCM) (compare:CCM...
On Mon, Nov 16, 2020 at 4:58 PM Jeff Law wrote:
>
>
> On 11/9/20 11:57 AM, H.J. Lu via Gcc-patches wrote:
> > GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA
> > levels:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250
> >
> > Binutils has been updated to suppor
On 11/9/20 11:57 AM, H.J. Lu via Gcc-patches wrote:
> GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA
> levels:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250
>
> Binutils has been updated to support GNU_PROPERTY_X86_ISA_1_V[234] marker:
>
> https://gitlab.com/x8
GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA
levels:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250
Binutils has been updated to support GNU_PROPERTY_X86_ISA_1_V[234] marker:
https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/13
with
commit b0ab06937385e
On Sun, Oct 18, 2020 at 8:16 AM Jan Hubicka wrote:
>
> > On Fri, Oct 2, 2020 at 6:21 AM H.J. Lu wrote:
> > >
> > > On Wed, Sep 16, 2020 at 10:07 PM H.J. Lu wrote:
> > > >
> > > > On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu wrote:
> > > > >
> > > > > On Tue, May 19, 2020 at 5:14 AM H.J. Lu wrote:
>
On Sun, Oct 18, 2020 at 8:16 AM Jan Hubicka wrote:
>
> > On Fri, Oct 2, 2020 at 6:21 AM H.J. Lu wrote:
> > >
> > > On Wed, Sep 16, 2020 at 10:07 PM H.J. Lu wrote:
> > > >
> > > > On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu wrote:
> > > > >
> > > > > On Tue, May 19, 2020 at 5:14 AM H.J. Lu wrote:
>
> On Fri, Oct 2, 2020 at 6:21 AM H.J. Lu wrote:
> >
> > On Wed, Sep 16, 2020 at 10:07 PM H.J. Lu wrote:
> > >
> > > On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu wrote:
> > > >
> > > > On Tue, May 19, 2020 at 5:14 AM H.J. Lu wrote:
> > > > >
> > > > > On Tue, May 19, 2020 at 1:48 AM Uros Bizjak wrot
On Fri, Oct 2, 2020 at 6:21 AM H.J. Lu wrote:
>
> On Wed, Sep 16, 2020 at 10:07 PM H.J. Lu wrote:
> >
> > On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu wrote:
> > >
> > > On Tue, May 19, 2020 at 5:14 AM H.J. Lu wrote:
> > > >
> > > > On Tue, May 19, 2020 at 1:48 AM Uros Bizjak wrote:
> > > > >
> > >
> gcc/ChangeLog:
>
> * config/i386/avx2intrin.h (_mm_broadcastsi128_si256): New intrinsics.
> (_mm_broadcastsd_pd): Ditto.
> * config/i386/avx512bwintrin.h (_mm512_loadu_epi16): New intrinsics.
> (_mm512_storeu_epi16): Ditto.
> (_mm512_loadu_epi8): Ditto.
> (_mm512_storeu_epi8): Ditto.
> * config/i
Tested on x86-64.
gcc/ChangeLog:
* config/i386/avx2intrin.h (_mm_broadcastsi128_si256): New intrinsics.
(_mm_broadcastsd_pd): Ditto.
* config/i386/avx512bwintrin.h (_mm512_loadu_epi16): New intrinsics.
(_mm512_storeu_epi16): Ditto.
(_mm512_loadu_epi8): Ditt
On Fri, Oct 2, 2020 at 5:51 AM H.J. Lu wrote:
>
> On Wed, Sep 23, 2020 at 10:58 AM H.J. Lu wrote:
> >
> > For sources which can't use any vector instructions, and
> > cannot be included for compiler intrinsics:
> >
> > $ echo "#include " | gcc -S -O2 -mno-sse -mno-mmx -x c -
> > In file include
1 - 100 of 217 matches
Mail list logo