Re: [PATCH v3] x86: Update MMX moves to support all 1s vectors

H.J. Lu Mon, 14 Jul 2025 00:38:05 -0700

On Mon, Jul 14, 2025 at 3:11 PM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Mon, Jul 14, 2025 at 5:32 AM Uros Bizjak <ubiz...@gmail.com> wrote:
> >
> > On Mon, Jul 14, 2025 at 2:14 AM H.J. Lu <hjl.to...@gmail.com> wrote:
> > >
> > > On Sat, Jul 12, 2025 at 7:51 PM Uros Bizjak <ubiz...@gmail.com> wrote:
> > > >
> > > > On Sat, Jul 12, 2025 at 1:41 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> > > > >
> > > > > On Sat, Jul 12, 2025 at 5:58 PM Uros Bizjak <ubiz...@gmail.com> wrote:
> > > > > >
> > > > > > On Sat, Jul 12, 2025 at 11:52 AM H.J. Lu <hjl.to...@gmail.com> 
> > > > > > wrote:
> > > > > > >
> > > > > > > On Sat, Jul 12, 2025 at 5:03 PM Uros Bizjak <ubiz...@gmail.com> 
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Fri, Jul 11, 2025 at 6:05 AM H.J. Lu <hjl.to...@gmail.com> 
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > commit 77473a27bae04da99d6979d43e7bd0a8106f4557
> > > > > > > > > Author: H.J. Lu <hjl.to...@gmail.com>
> > > > > > > > > Date:   Thu Jun 26 06:08:51 2025 +0800
> > > > > > > > >
> > > > > > > > >     x86: Also handle all 1s float vector constant
> > > > > > > > >
> > > > > > > > > replaces
> > > > > > > > >
> > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > >         (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) 
> > > > > > > > > [0  S8 A64])) 2031
> > > > > > > > >  {*movv2sf_internal}
> > > > > > > > >      (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > >                 (const_double:SF -QNaN [-QNaN]) repeated x2
> > > > > > > > >             ])
> > > > > > > > >         (nil)))
> > > > > > > > >
> > > > > > > > > with
> > > > > > > > >
> > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > >         (const_vector:V8QI [
> > > > > > > > >                 (const_int -1 [0xffffffffffffffff]) repeated 
> > > > > > > > > x8
> > > > > > > > >             ])) -1
> > > > > > > > >      (nil))
> > > > > > > > > ...
> > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > >         (subreg:V2SF (reg:V8QI 112) 0)) 2031 
> > > > > > > > > {*movv2sf_internal}
> > > > > > > > >      (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > >                 (const_double:SF -QNaN [-QNaN]) repeated x2
> > > > > > > > >             ])
> > > > > > > > >         (nil)))
> > > > > > > > >
> > > > > > > > > which leads to
> > > > > > > > >
> > > > > > > > > pr121015.c: In function ‘render_result_from_bake_h’:
> > > > > > > > > pr121015.c:34:1: error: unrecognizable insn:
> > > > > > > > >    34 | }
> > > > > > > > >       | ^
> > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > >         (const_vector:V8QI [
> > > > > > > > >                 (const_int -1 [0xffffffffffffffff]) repeated 
> > > > > > > > > x8
> > > > > > > > >             ])) -1
> > > > > > > > >      (expr_list:REG_EQUIV (const_vector:V8QI [
> > > > > > > > >                 (const_int -1 [0xffffffffffffffff]) repeated 
> > > > > > > > > x8
> > > > > > > > >             ])
> > > > > > > > >         (nil)))
> > > > > > > > > during RTL pass: ira
> > > > > > > > >
> > > > > > > > > 1. Add vector_const0_or_m1_operand for vector 0 or integer 
> > > > > > > > > vector -1.
> > > > > > > > > 2. Add nonimm_or_vector_const0_or_m1_operand for 
> > > > > > > > > nonimmediate, vector 0
> > > > > > > > > or integer vector -1 operand.
> > > > > > > > > 3. Add BX constraint for MMX vector constant all 0s/1s 
> > > > > > > > > operand.
> > > > > > > > > 4. Update MMXMODE:*mov<mode>_internal to support integer all 
> > > > > > > > > 1s vectors.
> > > > > > > > > Replace <v,C> with <v,BX> to generate
> > > > > > > > >
> > > > > > > > > pcmpeqd %xmm0, %xmm0
> > > > > > > > >
> > > > > > > > > for
> > > > > > > > >
> > > > > > > > > (set (reg/i:V8QI 20 xmm0)
> > > > > > > > >      (const_vector:V8QI [(const_int -1 [0xffffffffffffffff]) 
> > > > > > > > > repeated x8]))
> > > > > > > > >
> > > > > > > > > NB: The upper 64 bits in XMM0 are all 1s, instead of all 0s.
> > > > > > > >
> > > > > > > > Actually, we don't want this, we should keep the top 64 bits 
> > > > > > > > zero,
> > > > > > > > especially for floating point, where the pattern represents NaN.
> > > > > > > >
> > > > > > > > So, I think the correct way is to avoid the transformation for
> > > > > > > > narrower modes in the first place.
> > > > > > > >
> > > > > > >
> > > > > > > How does your latest patch handle this?
> > > > > > >
> > > > > > > typedef char __v8qi __attribute__ ((__vector_size__ (8)));
> > > > > > >
> > > > > > > __v8qi
> > > > > > > m1 (void)
> > > > > > > {
> > > > > > >   return __extension__(__v8qi){-1, -1, -1, -1, -1, -1, -1, -1};
> > > > > > > }
> > > > > >
> > > > > > No, my patch is also not appropriate, because it also introduces
> > > > > > "pcmpeq %xmm, %xmm". We should not generate 8-byte all-ones load 
> > > > > > using
> > > > > > pcmpeq, because upper 64 bits are also all 1s.
> > > > > >
> > > > > > The correct way is to avoid generating 64 bit all-ones, because this
> > > > > > constant is not supported and   standard_sse_constant_p () correctly
> > > > > > reports this.
> > > > >
> > > > > We can generate
> > > > >
> > > > > pcmpeqd %xmm0, %xmm0
> > > > > movq %xmm0, %xmm0
> > > > >
> > > > > for V8QI and
> > > > >
> > > > > pcmpeqd %xmm0, %xmm0
> > > > > movd %xmm0, %xmm0
> > > > >
> > > > > for V4QI.
> > > >
> > > > I don't think this is better than skipping the transformation for
> > > > instructions that we in fact emulate altogether. While loading
> > > > all-zero is OK in any mode, loading all-one is not OK for narrow
> > > > modes. So, this transformation should simply be skipped for all-one in
> > > > narrow modes.
> > >
> > > Here is the v3 patch, which allows 4-byte/8-byte all 1s in mmx.md
> > > and split to load from memory if the destination is an XMM register.
> >
> > Why don't we just skip the generation of narrow-mode all-ones vector
> > constants in the new pass altogether? It is not worth complicating
> > move patterns for a very seldom used feature and for very small (if at
> > all) gain.
> >
> > Please just change the pass to not generate vetro all-ones in 64bit or
> > narrower modes.
>
> I'm not familiar with the pass, but IMO the attached patch should be a
> good starting point. We don't want to CSE narrow all-ones with their
> wide counterparts, because we want zeros in top bytes of the narrow
> all-ones operands.
>
> Uros.


I am testing this.

-- 
H.J.

From de1cc2ee480483d03170e75b8b189f340bc71154 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.to...@gmail.com>
Date: Sun, 13 Jul 2025 08:59:34 +0800
Subject: [PATCH v4] x86: Skip all 1s vector constant narrower than 16 bytes
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

commit 77473a27bae04da99d6979d43e7bd0a8106f4557
Author: H.J. Lu <hjl.to...@gmail.com>
Date:   Thu Jun 26 06:08:51 2025 +0800

    x86: Also handle all 1s float vector constant

replaces

(insn 29 28 30 5 (set (reg:V2SF 107)
        (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S8 A64])) 2031 {*movv2sf_internal}
     (expr_list:REG_EQUAL (const_vector:V2SF [
                (const_double:SF -QNaN [-QNaN]) repeated x2
            ])
        (nil)))

with

(insn 98 13 14 3 (set (reg:V8QI 112)
        (const_vector:V8QI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])) -1
     (nil))
...
(insn 29 28 30 5 (set (reg:V2SF 107)
        (subreg:V2SF (reg:V8QI 112) 0)) 2031 {*movv2sf_internal}
     (expr_list:REG_EQUAL (const_vector:V2SF [
                (const_double:SF -QNaN [-QNaN]) repeated x2
            ])
        (nil)))

which leads to

pr121015.c: In function ‘render_result_from_bake_h’:
pr121015.c:34:1: error: unrecognizable insn:
   34 | }
      | ^
(insn 98 13 14 3 (set (reg:V8QI 112)
        (const_vector:V8QI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])) -1
     (expr_list:REG_EQUIV (const_vector:V8QI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])
        (nil)))
during RTL pass: ira

1. Update the remove_redundant_vector pass to skip all 1s vector constant
narrower than 16 bytes.
2. Convert integer register loads from CONSTM1_RTX in memory to
constm1_rtx move.

gcc/

	PR target/121015
	* config/i386/i386.cc (ix86_broadcast_inner): Skip all 1s vector
	constant narrower than 16 bytes.
	* config/i386/mmx.md: Add MMXMODE and V_32 splitters to convert
	integer register loads from CONSTM1_RTX in memory to constm1_rtx
	move.

gcc/testsuite/

	PR target/121015
	* gcc.target/i386/pr121015-1.c: New test.
	* gcc.target/i386/pr121015-2a.c: Likewise.
	* gcc.target/i386/pr121015-2b.c: Likewise.
	* gcc.target/i386/pr121015-3.c: Likewise.
	* gcc.target/i386/pr121015-4.c: Likewise.
	* gcc.target/i386/pr121015-5a.c: Likewise.
	* gcc.target/i386/pr121015-5b.c: Likewise.
	* gcc.target/i386/pr121015-5c.c: Likewise.
	* gcc.target/i386/pr121015-6.c: Likewise.
	* gcc.target/i386/pr121015-7a.c: Likewise.
	* gcc.target/i386/pr121015-7b.c: Likewise.
	* gcc.target/i386/pr121015-7c.c: Likewise.
	* gcc.target/i386/pr121015-8.c: Likewise.
	* gcc.target/i386/pr121015-9.c: Likewise.
	* gcc.target/i386/pr121015-10a.c: Likewise.
	* gcc.target/i386/pr121015-10b.c: Likewise.
	* gcc.target/i386/pr121015-10c.c: Likewise.

Signed-off-by: H.J. Lu <hjl.to...@gmail.com>
---
 gcc/config/i386/i386-features.cc             |  5 ++
 gcc/config/i386/mmx.md                       | 48 ++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr121015-1.c   | 32 +++++++++++++
 gcc/testsuite/gcc.target/i386/pr121015-10a.c | 32 +++++++++++++
 gcc/testsuite/gcc.target/i386/pr121015-10b.c | 16 +++++++
 gcc/testsuite/gcc.target/i386/pr121015-10c.c | 21 +++++++++
 gcc/testsuite/gcc.target/i386/pr121015-11a.c | 21 +++++++++
 gcc/testsuite/gcc.target/i386/pr121015-11b.c | 13 ++++++
 gcc/testsuite/gcc.target/i386/pr121015-11c.c | 17 +++++++
 gcc/testsuite/gcc.target/i386/pr121015-2a.c  | 23 ++++++++++
 gcc/testsuite/gcc.target/i386/pr121015-2b.c  |  6 +++
 gcc/testsuite/gcc.target/i386/pr121015-3.c   | 35 ++++++++++++++
 gcc/testsuite/gcc.target/i386/pr121015-4.c   | 22 +++++++++
 gcc/testsuite/gcc.target/i386/pr121015-5a.c  | 21 +++++++++
 gcc/testsuite/gcc.target/i386/pr121015-5b.c  | 16 +++++++
 gcc/testsuite/gcc.target/i386/pr121015-5c.c  | 20 ++++++++
 gcc/testsuite/gcc.target/i386/pr121015-6.c   | 23 ++++++++++
 gcc/testsuite/gcc.target/i386/pr121015-7a.c  | 23 ++++++++++
 gcc/testsuite/gcc.target/i386/pr121015-7b.c  |  6 +++
 gcc/testsuite/gcc.target/i386/pr121015-7c.c  |  8 ++++
 gcc/testsuite/gcc.target/i386/pr121015-8.c   | 13 ++++++
 gcc/testsuite/gcc.target/i386/pr121015-9.c   | 14 ++++++
 22 files changed, 435 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-10a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-10b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-10c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-11a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-11b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-11c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-5a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-5b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-5c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-7a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-7b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-7c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-9.c

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 054f8d5ddc8..20b15544408 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -3546,6 +3546,11 @@ ix86_broadcast_inner (rtx op, machine_mode mode,
 	    || (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT
 		&& float_vector_all_ones_operand (op, mode)))
     {
+      /* Skip if vector size is less than 16 bytes since all 1s SSE
+	 constants must be at leas 16 bytes.  */
+      if (GET_MODE_SIZE (mode) < 16)
+	return nullptr;
+
       *scalar_mode_p = QImode;
       *kind_p = X86_CSE_CONSTM1_VECTOR;
       *insn_p = nullptr;
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 29a8cb599a7..00f3657f796 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -304,6 +304,30 @@ (define_insn "*mov<mode>_internal"
 	   ]
 	   (symbol_ref "true")))])
 
+(define_split
+  [(set (match_operand:MMXMODE 0 "register_operand")
+	(match_operand:MMXMODE 1 "memory_operand"))]
+  "TARGET_64BIT && reload_completed && GENERAL_REG_P (operands[0])"
+  [(const_int 0)]
+{
+  rtx op1 = operands[1];
+  rtx op = find_reg_note (curr_insn, REG_EQUAL, nullptr);
+  if (!op)
+    op = find_reg_note (curr_insn, REG_EQUIV, nullptr);
+  if (op)
+    {
+      op = XEXP (op, 0);
+      if (int_float_vector_all_ones_operand (op, <MODE>mode))
+        {
+          rtx reg = gen_rtx_REG (DImode, REGNO (operands[0]));
+          emit_move_insn (reg, constm1_rtx);
+          op1 = gen_rtx_SUBREG (<MODE>mode, reg, 0);
+        }
+    }
+  emit_move_insn (operands[0], op1);
+  DONE;
+})
+
 (define_split
   [(set (match_operand:MMXMODE 0 "nonimmediate_gr_operand")
 	(match_operand:MMXMODE 1 "nonimmediate_gr_operand"))]
@@ -407,6 +431,30 @@ (define_insn "*mov<mode>_internal"
 	   ]
 	   (symbol_ref "true")))])
 
+(define_split
+  [(set (match_operand:V_32 0 "register_operand")
+	(match_operand:V_32 1 "memory_operand"))]
+  "reload_completed && GENERAL_REG_P (operands[0])"
+  [(const_int 0)]
+{
+  rtx op1 = operands[1];
+  rtx op = find_reg_note (curr_insn, REG_EQUAL, nullptr);
+  if (!op)
+    op = find_reg_note (curr_insn, REG_EQUIV, nullptr);
+  if (op)
+    {
+      op = XEXP (op, 0);
+      if (int_float_vector_all_ones_operand (op, <MODE>mode))
+        {
+          rtx reg = gen_rtx_REG (SImode, REGNO (operands[0]));
+          emit_move_insn (reg, constm1_rtx);
+          op1 = gen_rtx_SUBREG (<MODE>mode, reg, 0);
+        }
+    }
+  emit_move_insn (operands[0], op1);
+  DONE;
+})
+
 ;; 16-bit, 32-bit and 64-bit constant vector stores.  After reload,
 ;; convert them to immediate integer stores.
 (define_insn_and_split "*mov<mode>_imm"
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-1.c b/gcc/testsuite/gcc.target/i386/pr121015-1.c
new file mode 100644
index 00000000000..57c8bff14ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-1.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64-v3" } */
+
+extern union {
+  int i;
+  float f;
+} int_as_float_u;
+
+extern int render_result_from_bake_w;
+extern int render_result_from_bake_h_seed_pass;
+extern float *render_result_from_bake_h_primitive;
+extern float *render_result_from_bake_h_seed;
+
+float
+int_as_float(int i)
+{
+  int_as_float_u.i = i;
+  return int_as_float_u.f;
+}
+
+void
+render_result_from_bake_h(int tx)
+{
+  while (render_result_from_bake_w) {
+    for (; tx < render_result_from_bake_w; tx++)
+      render_result_from_bake_h_primitive[1] =
+          render_result_from_bake_h_primitive[2] = int_as_float(-1);
+    if (render_result_from_bake_h_seed_pass) {
+      *render_result_from_bake_h_seed = 0;
+    }
+  }
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-10a.c b/gcc/testsuite/gcc.target/i386/pr121015-10a.c
new file mode 100644
index 00000000000..67b574cc837
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-10a.c
@@ -0,0 +1,32 @@
+/* { dg-do compile { target fpic } } */
+/* { dg-options "-O2 -march=x86-64 -fpic" } */
+/* Keep labels and directives ('.cfi_startproc', '.cfi_endproc').  */
+/* { dg-final { check-function-bodies "**" "" "" { target { ! ia32 } } {^\t?\.}  } } */
+
+/*
+**__bid64_to_binary80:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**	mov(l|q)	__bid64_to_binary80_x_out@GOTPCREL\(%rip\), %(r|e)ax
+**	movq	\$-1, \(%(r|e)ax\)
+**	ret
+**...
+*/
+
+typedef struct {
+  struct {
+    unsigned short lo4;
+    unsigned short lo3;
+    unsigned short lo2;
+    unsigned short lo1;
+  } i;
+} BID_BINARY80LDOUBLE;
+extern BID_BINARY80LDOUBLE __bid64_to_binary80_x_out;
+void
+__bid64_to_binary80 (void)
+{
+  __bid64_to_binary80_x_out.i.lo4
+    = __bid64_to_binary80_x_out.i.lo3
+    = __bid64_to_binary80_x_out.i.lo2
+    = __bid64_to_binary80_x_out.i.lo1 = 65535;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-10b.c b/gcc/testsuite/gcc.target/i386/pr121015-10b.c
new file mode 100644
index 00000000000..06cb58f702d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-10b.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { fpic && lp64 } } } */
+/* { dg-options "-O2 -march=x86-64 -fno-pic -mcmodel=large" } */
+
+/*
+**__bid64_to_binary80:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**	movabsq	\$.LC0, %rax
+**	movq	\(%rax\), %rdx
+**	movabsq	\$__bid64_to_binary80_x_out, %rax
+**	movq	%rdx, \(%rax\)
+**	ret
+**...
+*/
+
+#include "pr121015-10a.c"
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-10c.c b/gcc/testsuite/gcc.target/i386/pr121015-10c.c
new file mode 100644
index 00000000000..573a1562883
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-10c.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target { fpic && lp64 } } } */
+/* { dg-options "-O2 -march=x86-64 -fpic -mcmodel=large" } */
+
+/*
+**__bid64_to_binary80:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**.L2:
+**	leaq	.L2\(%rip\), %rax
+**	movabsq	\$_GLOBAL_OFFSET_TABLE_-.L2, %r11
+**	movabsq	\$__bid64_to_binary80_x_out@GOT, %rdx
+**	movabsq	\$.LC0@GOTOFF, %rcx
+**	addq	%r11, %rax
+**	movq	\(%rax,%rdx\), %rdx
+**	movq	\(%rax,%rcx\), %rax
+**	movq	%rax, \(%rdx\)
+**	ret
+**...
+*/
+
+#include "pr121015-10a.c"
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-11a.c b/gcc/testsuite/gcc.target/i386/pr121015-11a.c
new file mode 100644
index 00000000000..b8bb3849fb7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-11a.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target fpic } } */
+/* { dg-options "-O2 -march=x86-64 -fpic" } */
+/* Keep labels and directives ('.cfi_startproc', '.cfi_endproc').  */
+/* { dg-final { check-function-bodies "**" "" "" { target { ! ia32 } } {^\t?\.}  } } */
+
+/*
+**foo:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**	movd	.LC0\(%rip\), %xmm0
+**...
+*/
+
+typedef char __v4qi __attribute__ ((__vector_size__ (4)));
+
+void
+foo (void)
+{
+  __v4qi x = __extension__(__v4qi){-1, -1, -1, -1};
+  asm ("reg %0" : : "v" (x));
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-11b.c b/gcc/testsuite/gcc.target/i386/pr121015-11b.c
new file mode 100644
index 00000000000..9ff2908829b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-11b.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { fpic && lp64 } } } */
+/* { dg-options "-O2 -march=x86-64 -fno-pic -mcmodel=large" } */
+
+/*
+**foo:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**	movabsq	\$.LC0, %rax
+**	movd	\(%rax\), %xmm0
+**...
+*/
+
+#include "pr121015-11a.c"
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-11c.c b/gcc/testsuite/gcc.target/i386/pr121015-11c.c
new file mode 100644
index 00000000000..f0e6ccb2b92
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-11c.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { fpic && lp64 } } } */
+/* { dg-options "-O2 -march=x86-64 -fpic -mcmodel=large" } */
+
+/*
+**foo:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**.L2:
+**	movabsq	\$_GLOBAL_OFFSET_TABLE_-.L2, %r11
+**	leaq	.L2\(%rip\), %rax
+**	movabsq	\$.LC0@GOTOFF, %rdx
+**	addq	%r11, %rax
+**	movd	\(%rax,%rdx\), %xmm0
+**...
+*/
+
+#include "pr121015-11a.c"
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-2a.c b/gcc/testsuite/gcc.target/i386/pr121015-2a.c
new file mode 100644
index 00000000000..f94848023da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-2a.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64" } */
+
+void
+foo (int *c1, int *c2)
+{
+  if (c1)
+    {
+      c1 = __builtin_assume_aligned (c1, 16);
+      c1[0] = 0;
+      c1[1] = 0;
+    }
+  if (c2)
+    {
+      c2 = __builtin_assume_aligned (c2, 16);
+      c2[0] = 0;
+      c2[1] = 0;
+    }
+}
+
+/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$0," 4 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movq\[ \\t\]+\\\$0," 2 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xmm" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-2b.c b/gcc/testsuite/gcc.target/i386/pr121015-2b.c
new file mode 100644
index 00000000000..9df2766c612
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-2b.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2 -mno-sse" } */
+
+#include "pr121015-2a.c"
+
+/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$0," 4 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-3.c b/gcc/testsuite/gcc.target/i386/pr121015-3.c
new file mode 100644
index 00000000000..44bf63c73e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-3.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64" } */
+
+typedef enum { CPP_NUMBER } cpp_ttype;
+typedef struct {
+  bool unsignedp;
+  bool overflow;
+} cpp_num;
+extern cpp_num value, __trans_tmp_1;
+extern cpp_ttype eval_token_token_0;
+extern int eval_token_temp;
+static cpp_num
+eval_token(void)
+{
+  cpp_num __trans_tmp_2, result;
+  result.overflow = false;
+  switch (eval_token_token_0)
+    {
+    case CPP_NUMBER:
+      switch (eval_token_temp)
+	{
+	case 1:
+	  return __trans_tmp_1;
+	}
+      result.unsignedp = false;
+      __trans_tmp_2 = result;
+      return __trans_tmp_2;
+    }
+  return result;
+}
+void 
+_cpp_parse_expr_pfile(void)
+{
+  value = eval_token();
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-4.c b/gcc/testsuite/gcc.target/i386/pr121015-4.c
new file mode 100644
index 00000000000..2848a946dd1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-4.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64" } */
+/* Keep labels and directives ('.cfi_startproc', '.cfi_endproc').  */
+/* { dg-final { check-function-bodies "**" "" "" { target { ! ia32 } } {^\t?\.}  } } */
+
+/*
+**zero:
+**.LFB0:
+**	.cfi_startproc
+**	xorps	%xmm0, %xmm0
+**	ret
+**...
+*/
+
+typedef float __v2sf __attribute__ ((__vector_size__ (8)));
+extern __v2sf f1;
+
+__v2sf
+zero (void)
+{
+  return __extension__(__v2sf){0, 0};
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-5a.c b/gcc/testsuite/gcc.target/i386/pr121015-5a.c
new file mode 100644
index 00000000000..605a87db1fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-5a.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64" } */
+/* Keep labels and directives ('.cfi_startproc', '.cfi_endproc').  */
+/* { dg-final { check-function-bodies "**" "" "" { target { ! ia32 } } {^\t?\.}  } } */
+
+/*
+**m1:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**	movq	.LC[0-9]+\(%rip\), %xmm0
+**	ret
+**...
+*/
+
+typedef char __v8qi __attribute__ ((__vector_size__ (8)));
+
+__v8qi
+m1 (void)
+{
+  return __extension__(__v8qi){-1, -1, -1, -1, -1, -1, -1, -1};
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-5b.c b/gcc/testsuite/gcc.target/i386/pr121015-5b.c
new file mode 100644
index 00000000000..22d51fd33ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-5b.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { fpic && lp64 } } } */
+/* { dg-options "-O2 -march=x86-64 -fno-pic -mcmodel=large" } */
+/* Keep labels and directives ('.cfi_startproc', '.cfi_endproc').  */
+/* { dg-final { check-function-bodies "**" "" "" { target "*-*-*" } {^\t?\.}  } } */
+
+/*
+**m1:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**	movabsq	\$.LC0, %rax
+**	movq	\(%rax\), %xmm0
+**	ret
+**...
+*/
+
+#include "pr121015-5a.c"
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-5c.c b/gcc/testsuite/gcc.target/i386/pr121015-5c.c
new file mode 100644
index 00000000000..bb210fa71ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-5c.c
@@ -0,0 +1,20 @@
+/* { dg-do compile { target { fpic && lp64 } } } */
+/* { dg-options "-O2 -march=x86-64 -fpic -mcmodel=large" } */
+/* Keep labels and directives ('.cfi_startproc', '.cfi_endproc').  */
+/* { dg-final { check-function-bodies "**" "" "" { target "*-*-*" } {^\t?\.}  } } */
+
+/*
+**m1:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**.L2:
+**	movabsq	\$_GLOBAL_OFFSET_TABLE_-.L2, %r11
+**	leaq	.L2\(%rip\), %rax
+**	movabsq	\$.LC0@GOTOFF, %rdx
+**	addq	%r11, %rax
+**	movq	\(%rax,%rdx\), %xmm0
+**	ret
+**...
+*/
+
+#include "pr121015-5a.c"
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-6.c b/gcc/testsuite/gcc.target/i386/pr121015-6.c
new file mode 100644
index 00000000000..daebcb0acc5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-6.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64" } */
+/* Keep labels and directives ('.cfi_startproc', '.cfi_endproc').  */
+/* { dg-final { check-function-bodies "**" "" "" { target { ! ia32 } } {^\t?\.}  } } */
+
+/*
+**m1:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**	pcmpeqd	%xmm0, %xmm0
+**	ret
+**...
+*/
+
+#include <x86intrin.h>
+
+__m128i
+m1 (void)
+{
+  __m64 x = _mm_set1_pi8 (-1);
+  __m128i y = _mm_set1_epi64 (x);
+  return y;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-7a.c b/gcc/testsuite/gcc.target/i386/pr121015-7a.c
new file mode 100644
index 00000000000..94037e33d81
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-7a.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64" } */
+
+void
+foo (int *c1, int *c2)
+{
+  if (c1)
+    {
+      c1 = __builtin_assume_aligned (c1, 16);
+      c1[0] = -1;
+      c1[1] = -1;
+    }
+  if (c2)
+    {
+      c2 = __builtin_assume_aligned (c2, 16);
+      c2[0] = -1;
+      c2[1] = -1;
+    }
+}
+
+/* { dg-final { scan-assembler-times "movq\[ \\t\]+\[^\n\]*%xmm" 4 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movq\[ \\t\]+\\\$-1," 2 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xmm" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-7b.c b/gcc/testsuite/gcc.target/i386/pr121015-7b.c
new file mode 100644
index 00000000000..3784ce0dfed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-7b.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2 -mno-sse" } */
+
+#include "pr121015-7a.c"
+
+/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$-1," 4 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-7c.c b/gcc/testsuite/gcc.target/i386/pr121015-7c.c
new file mode 100644
index 00000000000..33b2df3ac9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-7c.c
@@ -0,0 +1,8 @@
+/* { dg-do compile { target fpic } } */
+/* { dg-options "-O2 -march=x86-64 -fpic" } */
+
+#include "pr121015-7a.c"
+
+/* { dg-final { scan-assembler-times "movq\[ \\t\]+\[^\n\]*%xmm" 4 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movq\[ \\t\]+\\\$-1," 2 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xmm" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-8.c b/gcc/testsuite/gcc.target/i386/pr121015-8.c
new file mode 100644
index 00000000000..f911ecc0fc9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-8.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-Og -fno-dce -mtune=generic" } */
+
+typedef int __attribute__((__vector_size__ (4))) S;
+extern int bar (S);
+
+int
+foo ()
+{
+  return bar ((S){-1});
+}
+
+/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$-1, " 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-9.c b/gcc/testsuite/gcc.target/i386/pr121015-9.c
new file mode 100644
index 00000000000..05c2021ba05
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-9.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-Og -fno-dce -mtune=generic" } */
+
+typedef int __attribute__((__vector_size__ (4))) S;
+extern int bar (S);
+
+int
+foo ()
+{
+  return bar ((S){0});
+}
+
+/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$0, \\(%esp\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$0, %edi" 1 { target { ! ia32 } } } } */
-- 
2.50.1

Re: [PATCH v3] x86: Update MMX moves to support all 1s vectors

Reply via email to