On Sat, Jul 12, 2025 at 7:51 PM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Sat, Jul 12, 2025 at 1:41 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> >
> > On Sat, Jul 12, 2025 at 5:58 PM Uros Bizjak <ubiz...@gmail.com> wrote:
> > >
> > > On Sat, Jul 12, 2025 at 11:52 AM H.J. Lu <hjl.to...@gmail.com> wrote:
> > > >
> > > > On Sat, Jul 12, 2025 at 5:03 PM Uros Bizjak <ubiz...@gmail.com> wrote:
> > > > >
> > > > > On Fri, Jul 11, 2025 at 6:05 AM H.J. Lu <hjl.to...@gmail.com> wrote:
> > > > > >
> > > > > > commit 77473a27bae04da99d6979d43e7bd0a8106f4557
> > > > > > Author: H.J. Lu <hjl.to...@gmail.com>
> > > > > > Date:   Thu Jun 26 06:08:51 2025 +0800
> > > > > >
> > > > > >     x86: Also handle all 1s float vector constant
> > > > > >
> > > > > > replaces
> > > > > >
> > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > >         (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  
> > > > > > S8 A64])) 2031
> > > > > >  {*movv2sf_internal}
> > > > > >      (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > >                 (const_double:SF -QNaN [-QNaN]) repeated x2
> > > > > >             ])
> > > > > >         (nil)))
> > > > > >
> > > > > > with
> > > > > >
> > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > >         (const_vector:V8QI [
> > > > > >                 (const_int -1 [0xffffffffffffffff]) repeated x8
> > > > > >             ])) -1
> > > > > >      (nil))
> > > > > > ...
> > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > >         (subreg:V2SF (reg:V8QI 112) 0)) 2031 {*movv2sf_internal}
> > > > > >      (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > >                 (const_double:SF -QNaN [-QNaN]) repeated x2
> > > > > >             ])
> > > > > >         (nil)))
> > > > > >
> > > > > > which leads to
> > > > > >
> > > > > > pr121015.c: In function ‘render_result_from_bake_h’:
> > > > > > pr121015.c:34:1: error: unrecognizable insn:
> > > > > >    34 | }
> > > > > >       | ^
> > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > >         (const_vector:V8QI [
> > > > > >                 (const_int -1 [0xffffffffffffffff]) repeated x8
> > > > > >             ])) -1
> > > > > >      (expr_list:REG_EQUIV (const_vector:V8QI [
> > > > > >                 (const_int -1 [0xffffffffffffffff]) repeated x8
> > > > > >             ])
> > > > > >         (nil)))
> > > > > > during RTL pass: ira
> > > > > >
> > > > > > 1. Add vector_const0_or_m1_operand for vector 0 or integer vector 
> > > > > > -1.
> > > > > > 2. Add nonimm_or_vector_const0_or_m1_operand for nonimmediate, 
> > > > > > vector 0
> > > > > > or integer vector -1 operand.
> > > > > > 3. Add BX constraint for MMX vector constant all 0s/1s operand.
> > > > > > 4. Update MMXMODE:*mov<mode>_internal to support integer all 1s 
> > > > > > vectors.
> > > > > > Replace <v,C> with <v,BX> to generate
> > > > > >
> > > > > > pcmpeqd %xmm0, %xmm0
> > > > > >
> > > > > > for
> > > > > >
> > > > > > (set (reg/i:V8QI 20 xmm0)
> > > > > >      (const_vector:V8QI [(const_int -1 [0xffffffffffffffff]) 
> > > > > > repeated x8]))
> > > > > >
> > > > > > NB: The upper 64 bits in XMM0 are all 1s, instead of all 0s.
> > > > >
> > > > > Actually, we don't want this, we should keep the top 64 bits zero,
> > > > > especially for floating point, where the pattern represents NaN.
> > > > >
> > > > > So, I think the correct way is to avoid the transformation for
> > > > > narrower modes in the first place.
> > > > >
> > > >
> > > > How does your latest patch handle this?
> > > >
> > > > typedef char __v8qi __attribute__ ((__vector_size__ (8)));
> > > >
> > > > __v8qi
> > > > m1 (void)
> > > > {
> > > >   return __extension__(__v8qi){-1, -1, -1, -1, -1, -1, -1, -1};
> > > > }
> > >
> > > No, my patch is also not appropriate, because it also introduces
> > > "pcmpeq %xmm, %xmm". We should not generate 8-byte all-ones load using
> > > pcmpeq, because upper 64 bits are also all 1s.
> > >
> > > The correct way is to avoid generating 64 bit all-ones, because this
> > > constant is not supported and   standard_sse_constant_p () correctly
> > > reports this.
> >
> > We can generate
> >
> > pcmpeqd %xmm0, %xmm0
> > movq %xmm0, %xmm0
> >
> > for V8QI and
> >
> > pcmpeqd %xmm0, %xmm0
> > movd %xmm0, %xmm0
> >
> > for V4QI.
>
> I don't think this is better than skipping the transformation for
> instructions that we in fact emulate altogether. While loading
> all-zero is OK in any mode, loading all-one is not OK for narrow
> modes. So, this transformation should simply be skipped for all-one in
> narrow modes.

Here is the v3 patch, which allows 4-byte/8-byte all 1s in mmx.md
and split to load from memory if the destination is an XMM register.

OK for master?

Thanks.

H.J.
---
commit 77473a27bae04da99d6979d43e7bd0a8106f4557
Author: H.J. Lu <hjl.to...@gmail.com>
Date:   Thu Jun 26 06:08:51 2025 +0800

    x86: Also handle all 1s float vector constant

replaces

(insn 29 28 30 5 (set (reg:V2SF 107)
        (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S8 A64])) 2031
 {*movv2sf_internal}
     (expr_list:REG_EQUAL (const_vector:V2SF [
                (const_double:SF -QNaN [-QNaN]) repeated x2
            ])
        (nil)))

with

(insn 98 13 14 3 (set (reg:V8QI 112)
        (const_vector:V8QI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])) -1
     (nil))
...
(insn 29 28 30 5 (set (reg:V2SF 107)
        (subreg:V2SF (reg:V8QI 112) 0)) 2031 {*movv2sf_internal}
     (expr_list:REG_EQUAL (const_vector:V2SF [
                (const_double:SF -QNaN [-QNaN]) repeated x2
            ])
        (nil)))

which leads to

pr121015.c: In function ‘render_result_from_bake_h’:
pr121015.c:34:1: error: unrecognizable insn:
   34 | }
      | ^
(insn 98 13 14 3 (set (reg:V8QI 112)
        (const_vector:V8QI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])) -1
     (expr_list:REG_EQUIV (const_vector:V8QI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])
        (nil)))
during RTL pass: ira

1. Add BZ constraint for constant 0 in 64-bit or without SSE to replace
C constraint in 8-byte MMX zeroing stores so that we can generate

pxor %xmm0, %xmm0
movq %xmm0, (%edx)

with SSE in 32-bit.
2. Extend standard_sse_constant_p to cover 4-byte and 8-byte all 1s to
support

(set (reg:V8QI 100)
     (const_vector:V8QI [
       (const_int -1 [0xffffffffffffffff]) repeated x8]))

and

(set (mem/c:V4QI (reg/f:SI 99)
     (const_vector:V4QI [
       (const_int -1 [0xffffffffffffffff]) repeated x4]))

3. Update 4-byte and 8-byte MMX moves to support constant all 1s vectors.
4. Split 4-byte and 8-byte MMX CONSTM1 moves by loading from memory if
they haven't been eliminated.

gcc/

PR target/121015
* config/i386/constraints.md (BZ): New constraint.
* config/i386/i386.cc (standard_sse_constant_p): Support 4-byte
and 8-byte all 1s.
(ix86_print_operand): Support CONSTM1_RTX.
* config/i386/mmx.md (mmxconstm1): New.
(MMXMODE:mov<mode>): Replace nonimm_or_0_operand with
nonimmediate_or_sse_const_operand.
(MMXMODE:*mov<mode>_internal): Replace C with BZ in zeroing
stores.  Add <m,<<mmxconstm1>> and <v,<mmxconstm1>> alternatives.
Add a MMXMODE splitter to split CONSTM1_RTX moves.
(V_32:mov<mode>): Replace nonimm_or_0_operand with
nonimmediate_or_sse_const_operand.
(V_32:*mov<mode>_internal): Add <rm,<<mmxconstm1>> and
<v,<mmxconstm1>> alternatives.
Add a V_32 splitter to split CONSTM1_RTX stores.

gcc/testsuite/

PR target/121015
* gcc.target/i386/pr117839-1b.c: Updated assembler scan for 8-byte
zeroing store with XMM register in 32-bit.
* gcc.target/i386/pr117839-2.c: Likewise.
* gcc.target/i386/pr121015-1.c: New test.
* gcc.target/i386/pr121015-2a.c: Likewise.
* gcc.target/i386/pr121015-2b.c: Likewise.
* gcc.target/i386/pr121015-3.c: Likewise.
* gcc.target/i386/pr121015-4.c: Likewise.
* gcc.target/i386/pr121015-5a.c: Likewise.
* gcc.target/i386/pr121015-5b.c: Likewise.
* gcc.target/i386/pr121015-5c.c: Likewise.
* gcc.target/i386/pr121015-6.c: Likewise.
* gcc.target/i386/pr121015-7a.c: Likewise.
* gcc.target/i386/pr121015-7b.c: Likewise.
* gcc.target/i386/pr121015-7c.c: Likewise.
* gcc.target/i386/pr121015-8.c: Likewise.
* gcc.target/i386/pr121015-9.c: Likewise.
* gcc.target/i386/pr121015-10a.c: Likewise.
* gcc.target/i386/pr121015-10b.c: Likewise.
* gcc.target/i386/pr121015-10c.c: Likewise.

Signed-off-by: H.J. Lu <hjl.to...@gmail.com>
Co-Authored-By: Uros Bizjak <ubiz...@gmail.com
From 7d782108ec2937c880a533182ed243efa5801ce2 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.to...@gmail.com>
Date: Sun, 13 Jul 2025 08:59:34 +0800
Subject: [PATCH v3] x86: Update MMX moves to support all 1s vectors
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

commit 77473a27bae04da99d6979d43e7bd0a8106f4557
Author: H.J. Lu <hjl.to...@gmail.com>
Date:   Thu Jun 26 06:08:51 2025 +0800

    x86: Also handle all 1s float vector constant

replaces

(insn 29 28 30 5 (set (reg:V2SF 107)
        (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S8 A64])) 2031 {*movv2sf_internal}
     (expr_list:REG_EQUAL (const_vector:V2SF [
                (const_double:SF -QNaN [-QNaN]) repeated x2
            ])
        (nil)))

with

(insn 98 13 14 3 (set (reg:V8QI 112)
        (const_vector:V8QI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])) -1
     (nil))
...
(insn 29 28 30 5 (set (reg:V2SF 107)
        (subreg:V2SF (reg:V8QI 112) 0)) 2031 {*movv2sf_internal}
     (expr_list:REG_EQUAL (const_vector:V2SF [
                (const_double:SF -QNaN [-QNaN]) repeated x2
            ])
        (nil)))

which leads to

pr121015.c: In function ‘render_result_from_bake_h’:
pr121015.c:34:1: error: unrecognizable insn:
   34 | }
      | ^
(insn 98 13 14 3 (set (reg:V8QI 112)
        (const_vector:V8QI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])) -1
     (expr_list:REG_EQUIV (const_vector:V8QI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])
        (nil)))
during RTL pass: ira

1. Add BZ constraint for constant 0 in 64-bit or without SSE to replace
C constraint in 8-byte MMX zeroing stores so that we can generate

	pxor	%xmm0, %xmm0
	movq	%xmm0, (%edx)

with SSE in 32-bit.
2. Extend standard_sse_constant_p to cover 4-byte and 8-byte all 1s to
support

(set (reg:V8QI 100)
     (const_vector:V8QI [
       (const_int -1 [0xffffffffffffffff]) repeated x8]))

and

(set (mem/c:V4QI (reg/f:SI 99)
     (const_vector:V4QI [
       (const_int -1 [0xffffffffffffffff]) repeated x4]))

3. Update 4-byte and 8-byte MMX moves to support constant all 1s vectors.
4. Split 4-byte and 8-byte MMX CONSTM1 moves by loading from memory if
they haven't been eliminated.

gcc/

	PR target/121015
	* config/i386/constraints.md (BZ): New constraint.
	* config/i386/i386.cc (standard_sse_constant_p): Support 4-byte
	and 8-byte all 1s.
	(ix86_print_operand): Support CONSTM1_RTX.
	* config/i386/mmx.md (mmxconstm1): New.
	(MMXMODE:mov<mode>): Replace nonimm_or_0_operand with
	nonimmediate_or_sse_const_operand.
	(MMXMODE:*mov<mode>_internal): Replace C with BZ in zeroing
	stores.  Add <m,<<mmxconstm1>> and <v,<mmxconstm1>> alternatives.
	Add a MMXMODE splitter to split CONSTM1_RTX moves.
	(V_32:mov<mode>): Replace nonimm_or_0_operand with
	nonimmediate_or_sse_const_operand.
	(V_32:*mov<mode>_internal): Add <rm,<<mmxconstm1>> and
	<v,<mmxconstm1>> alternatives.
	Add a V_32 splitter to split CONSTM1_RTX stores.

gcc/testsuite/

	PR target/121015
	* gcc.target/i386/pr117839-1b.c: Updated assembler scan for 8-byte
	zeroing store with XMM register in 32-bit.
	* gcc.target/i386/pr117839-2.c: Likewise.
	* gcc.target/i386/pr121015-1.c: New test.
	* gcc.target/i386/pr121015-2a.c: Likewise.
	* gcc.target/i386/pr121015-2b.c: Likewise.
	* gcc.target/i386/pr121015-3.c: Likewise.
	* gcc.target/i386/pr121015-4.c: Likewise.
	* gcc.target/i386/pr121015-5a.c: Likewise.
	* gcc.target/i386/pr121015-5b.c: Likewise.
	* gcc.target/i386/pr121015-5c.c: Likewise.
	* gcc.target/i386/pr121015-6.c: Likewise.
	* gcc.target/i386/pr121015-7a.c: Likewise.
	* gcc.target/i386/pr121015-7b.c: Likewise.
	* gcc.target/i386/pr121015-7c.c: Likewise.
	* gcc.target/i386/pr121015-8.c: Likewise.
	* gcc.target/i386/pr121015-9.c: Likewise.
	* gcc.target/i386/pr121015-10a.c: Likewise.
	* gcc.target/i386/pr121015-10b.c: Likewise.
	* gcc.target/i386/pr121015-10c.c: Likewise.

Signed-off-by: H.J. Lu <hjl.to...@gmail.com>
Co-Authored-By: Uros Bizjak <ubiz...@gmail.com>
---
 gcc/config/i386/constraints.md               |   7 +
 gcc/config/i386/i386.cc                      |  13 +-
 gcc/config/i386/mmx.md                       | 130 ++++++++++++++-----
 gcc/testsuite/gcc.target/i386/pr117839-1b.c  |   5 +-
 gcc/testsuite/gcc.target/i386/pr117839-2.c   |   5 +-
 gcc/testsuite/gcc.target/i386/pr121015-1.c   |  34 +++++
 gcc/testsuite/gcc.target/i386/pr121015-10a.c |  32 +++++
 gcc/testsuite/gcc.target/i386/pr121015-10b.c |  16 +++
 gcc/testsuite/gcc.target/i386/pr121015-10c.c |  21 +++
 gcc/testsuite/gcc.target/i386/pr121015-11a.c |  21 +++
 gcc/testsuite/gcc.target/i386/pr121015-11b.c |  13 ++
 gcc/testsuite/gcc.target/i386/pr121015-11c.c |  17 +++
 gcc/testsuite/gcc.target/i386/pr121015-2a.c  |  24 ++++
 gcc/testsuite/gcc.target/i386/pr121015-2b.c  |   6 +
 gcc/testsuite/gcc.target/i386/pr121015-3.c   |  35 +++++
 gcc/testsuite/gcc.target/i386/pr121015-4.c   |  22 ++++
 gcc/testsuite/gcc.target/i386/pr121015-5a.c  |  21 +++
 gcc/testsuite/gcc.target/i386/pr121015-5b.c  |  16 +++
 gcc/testsuite/gcc.target/i386/pr121015-5c.c  |  20 +++
 gcc/testsuite/gcc.target/i386/pr121015-6.c   |  23 ++++
 gcc/testsuite/gcc.target/i386/pr121015-7a.c  |  23 ++++
 gcc/testsuite/gcc.target/i386/pr121015-7b.c  |   6 +
 gcc/testsuite/gcc.target/i386/pr121015-7c.c  |   8 ++
 gcc/testsuite/gcc.target/i386/pr121015-8.c   |  14 ++
 gcc/testsuite/gcc.target/i386/pr121015-9.c   |  14 ++
 25 files changed, 512 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-10a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-10b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-10c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-11a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-11b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-11c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-5a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-5b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-5c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-7a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-7b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-7c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015-9.c

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 38877a7e61b..4bfd9b2a458 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -174,6 +174,7 @@ (define_register_constraint "YW"
 ;;     and zero-extand to 256/512bit, or 128bit all ones
 ;;     and zero-extend to 512bit.
 ;;  M  x86-64 memory operand.
+;;  Z  Constant zero operand in 64-bit or without SSE.
 
 (define_constraint "Bf"
   "@internal Flags register operand."
@@ -246,6 +247,12 @@ (define_constraint "BM"
        (match_test "memory_address_addr_space_p (GET_MODE (op), XEXP (op, 0),
 						 MEM_ADDR_SPACE (op))")))
 
+(define_constraint "BZ"
+  "@internal Constant zero operand in 64-bit or without SSE."
+  (and (match_test "TARGET_64BIT || !TARGET_SSE")
+       (ior (match_test "op == const0_rtx")
+            (match_operand 0 "const0_operand"))))
+
 ;; Integer constant constraints.
 (define_constraint "Wb"
   "Integer constant in the range 0 @dots{} 7, for 8-bit shifts."
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 313522b88e3..a9c1418a2bf 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -5448,6 +5448,8 @@ standard_sse_constant_p (rtx x, machine_mode pred_mode)
 	    return 2;
 	  break;
 	case 16:
+	case 8:
+	case 4:
 	  if (TARGET_SSE2)
 	    return 2;
 	  break;
@@ -14671,9 +14673,14 @@ ix86_print_operand (FILE *file, rtx x, int code)
 	 since we can in fact encode that into an immediate.  */
       if (GET_CODE (x) == CONST_VECTOR)
 	{
-	  if (x != CONST0_RTX (GET_MODE (x)))
-	    output_operand_lossage ("invalid vector immediate");
-	  x = const0_rtx;
+	  if (x == CONSTM1_RTX (GET_MODE (x)))
+	    x = constm1_rtx;
+	  else
+	    {
+	      if (x != CONST0_RTX (GET_MODE (x)))
+		output_operand_lossage ("invalid vector immediate");
+	      x = const0_rtx;
+	    }
 	}
 
       if (code == 'P')
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 29a8cb599a7..3b013e3db31 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -111,6 +111,13 @@ (define_mode_attr mmxinsnmode
    (V4BF "DI") (V2BF "SI")
    (V2SF "DI")])
 
+;; MMX constant -1 constraint
+(define_mode_attr mmxconstm1
+  [(V8QI "BC") (V4HI "BC") (V2SI "BC") (V1DI "BC")
+   (V4QI "BC") (V2HI "BC") (V1SI "BC")
+   (V4HF "BF") (V4BF "BF") (V2SF "BF")
+   (V2HF "BF") (V2BF "BF")])
+
 (define_mode_attr mmxdoublemode
   [(V8QI "V8HI") (V4HI "V4SI")])
 
@@ -174,20 +181,25 @@ (define_mode_attr Yv_Yw
 
 (define_expand "mov<mode>"
   [(set (match_operand:MMXMODE 0 "nonimmediate_operand")
-	(match_operand:MMXMODE 1 "nonimm_or_0_operand"))]
+	(match_operand:MMXMODE 1 "nonimmediate_or_sse_const_operand"))]
   "TARGET_MMX || TARGET_MMX_WITH_SSE"
 {
   ix86_expand_vector_move (<MODE>mode, operands);
   DONE;
 })
 
+;; There must no CONSTM1_RTX vector loads after reload.
 (define_insn "*mov<mode>_internal"
   [(set (match_operand:MMXMODE 0 "nonimmediate_operand"
-    "=r ,o ,r,r ,m ,?!y,!y,?!y,m  ,r  ,?!y,v,v,v,m,r,v,!y,*x")
-	(match_operand:MMXMODE 1 "nonimm_or_0_operand"
-    "rCo,rC,C,rm,rC,C  ,!y,m  ,?!y,?!y,r  ,C,v,m,v,v,r,*x,!y"))]
+    "=r ,o  ,r,r ,m  ,m           ,?!y,!y,?!y,m  ,r  ,?!y,v,v           ,v,v,m,r,v,!y,*x")
+	(match_operand:MMXMODE 1 "nonimmediate_or_sse_const_operand"
+    "rCo,rBZ,C,rm,rBZ,<mmxconstm1>,C  ,!y,m  ,?!y,?!y,r  ,C,<mmxconstm1>,v,m,v,v,r,*x,!y"))]
   "(TARGET_MMX || TARGET_MMX_WITH_SSE)
    && !(MEM_P (operands[0]) && MEM_P (operands[1]))
+   && (!reload_completed
+       || !(SSE_REG_P (operands[0])
+            && int_float_vector_all_ones_operand (operands[1],
+                                                  <MODE>mode)))
    && ix86_hardreg_mov_ok (operands[0], operands[1])"
 {
   switch (get_attr_type (insn))
@@ -230,31 +242,31 @@ (define_insn "*mov<mode>_internal"
   [(set (attr "isa")
      (cond [(eq_attr "alternative" "0,1")
 	      (const_string "nox64")
-	    (eq_attr "alternative" "2,3,4,9,10")
+	    (eq_attr "alternative" "2,3,4,10,11")
 	      (const_string "x64")
-	    (eq_attr "alternative" "15,16")
+	    (eq_attr "alternative" "5,17,18")
 	      (const_string "x64_sse2")
-	    (eq_attr "alternative" "17,18")
+	    (eq_attr "alternative" "13,19,20")
 	      (const_string "sse2")
 	   ]
 	   (const_string "*")))
    (set (attr "type")
      (cond [(eq_attr "alternative" "0,1")
 	      (const_string "multi")
-	    (eq_attr "alternative" "2,3,4")
+	    (eq_attr "alternative" "2,3,4,5")
 	      (const_string "imov")
-	    (eq_attr "alternative" "5")
+	    (eq_attr "alternative" "6")
 	      (const_string "mmx")
-	    (eq_attr "alternative" "6,7,8,9,10")
+	    (eq_attr "alternative" "7,8,9,10,11")
 	      (const_string "mmxmov")
-	    (eq_attr "alternative" "11")
+	    (eq_attr "alternative" "12,13")
 	      (const_string "sselog1")
-	    (eq_attr "alternative" "17,18")
+	    (eq_attr "alternative" "19,20")
 	      (const_string "ssecvt")
 	   ]
 	   (const_string "ssemov")))
    (set (attr "prefix_rex")
-     (if_then_else (eq_attr "alternative" "9,10,15,16")
+     (if_then_else (eq_attr "alternative" "10,11,17,18")
        (const_string "1")
        (const_string "*")))
    (set (attr "prefix")
@@ -269,7 +281,7 @@ (define_insn "*mov<mode>_internal"
    (set (attr "mode")
      (cond [(eq_attr "alternative" "2")
 	      (const_string "SI")
-	    (eq_attr "alternative" "11,12")
+	    (eq_attr "alternative" "12,13,14")
 	      (cond [(match_test "<MODE>mode == V2SFmode
 				  || <MODE>mode == V4HFmode
 				  || <MODE>mode == V4BFmode")
@@ -280,7 +292,7 @@ (define_insn "*mov<mode>_internal"
 		    ]
 		    (const_string "TI"))
 
-	    (and (eq_attr "alternative" "13")
+	    (and (eq_attr "alternative" "15")
 		 (ior (ior (and (match_test "<MODE>mode == V2SFmode")
 				(not (match_test "TARGET_MMX_WITH_SSE")))
 			   (not (match_test "TARGET_SSE2")))
@@ -288,7 +300,7 @@ (define_insn "*mov<mode>_internal"
 				  || <MODE>mode == V4BFmode")))
 	      (const_string "V2SF")
 
-	    (and (eq_attr "alternative" "14")
+	    (and (eq_attr "alternative" "16")
 		 (ior (ior (match_test "<MODE>mode == V2SFmode")
 			   (not (match_test "TARGET_SSE2")))
 		      (match_test "<MODE>mode == V4HFmode
@@ -297,13 +309,49 @@ (define_insn "*mov<mode>_internal"
 	   ]
 	   (const_string "DI")))
    (set (attr "preferred_for_speed")
-     (cond [(eq_attr "alternative" "9,15")
+     (cond [(eq_attr "alternative" "10,17")
 	      (symbol_ref "TARGET_INTER_UNIT_MOVES_FROM_VEC")
-	    (eq_attr "alternative" "10,16")
+	    (eq_attr "alternative" "11,18")
 	      (symbol_ref "TARGET_INTER_UNIT_MOVES_TO_VEC")
 	   ]
 	   (symbol_ref "true")))])
 
+;; Split
+;;
+;; (set (reg:V8QI 100)
+;;      (const_vector:V8QI [
+;;        (const_int -1 [0xffffffffffffffff]) repeated x8]))
+;;
+;; by loading from memory if it hasn't been eliminated to make top bits
+;; cleared in vector register.  For 32-bit PIC, we also must split
+;;
+;; (set (mem/c:V8QI (reg/f:SI 99)
+;;      (const_vector:V8QI [
+;;        (const_int -1 [0xffffffffffffffff]) repeated x8]))
+;;
+;; before reload since 32-bit doesn't support 8-byte immediate store and
+;; PIC register can't be allocated after reload.
+(define_split
+  [(set (match_operand:MMXMODE 0 "nonimmediate_operand")
+	(match_operand:MMXMODE 1 "int_float_vector_all_ones_operand"))]
+  "TARGET_SSE
+   && (SSE_REG_P (operands[0])
+       || (!reload_completed && flag_pic && !TARGET_64BIT))"
+   [(const_int 0)]
+{
+  operands[1] = validize_mem (force_const_mem (<MODE>mode,
+                                               operands[1]));
+  rtx src;
+  if (REG_P (operands[0]))
+    src = operands[1];
+  else
+    {
+      src = gen_reg_rtx (<MODE>mode);
+      emit_move_insn (src, operands[1]);
+    }
+  emit_move_insn (operands[0], src);
+})
+
 (define_split
   [(set (match_operand:MMXMODE 0 "nonimmediate_gr_operand")
 	(match_operand:MMXMODE 1 "nonimmediate_gr_operand"))]
@@ -329,19 +377,24 @@ (define_expand "movmisalign<mode>"
 
 (define_expand "mov<mode>"
   [(set (match_operand:V_32 0 "nonimmediate_operand")
-	(match_operand:V_32 1 "nonimm_or_0_operand"))]
+	(match_operand:V_32 1 "nonimmediate_or_sse_const_operand"))]
   ""
 {
   ix86_expand_vector_move (<MODE>mode, operands);
   DONE;
 })
 
+;; There must no CONSTM1_RTX vector loads after reload.
 (define_insn "*mov<mode>_internal"
   [(set (match_operand:V_32 0 "nonimmediate_operand"
-    "=r ,m ,v,v,v,m,r,v")
-	(match_operand:V_32 1 "nonimm_or_0_operand"
-    "rmC,rC,C,v,m,v,v,r"))]
+    "=r ,m ,rm          ,v,v           ,v,v,m,r,v")
+	(match_operand:V_32 1 "nonimmediate_or_sse_const_operand"
+    "rmC,rC,<mmxconstm1>,C,<mmxconstm1>,v,m,v,v,r"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))
+   && (!reload_completed
+       || !(SSE_REG_P (operands[0])
+            && int_float_vector_all_ones_operand (operands[1],
+                                                  <MODE>mode)))
    && ix86_hardreg_mov_ok (operands[0], operands[1])"
 {
   switch (get_attr_type (insn))
@@ -360,14 +413,14 @@ (define_insn "*mov<mode>_internal"
     }
 }
   [(set (attr "isa")
-     (cond [(eq_attr "alternative" "6,7")
+     (cond [(eq_attr "alternative" "2,4,8,9")
 	      (const_string "sse2")
 	   ]
 	   (const_string "*")))
    (set (attr "type")
-     (cond [(eq_attr "alternative" "2")
+     (cond [(eq_attr "alternative" "3,4")
 	      (const_string "sselog1")
-	    (eq_attr "alternative" "3,4,5,6,7")
+	    (eq_attr "alternative" "5,6,7,8,9")
 	      (const_string "ssemov")
 	   ]
 	   (const_string "imov")))
@@ -380,7 +433,7 @@ (define_insn "*mov<mode>_internal"
        (const_string "1")
        (const_string "*")))
    (set (attr "mode")
-     (cond [(eq_attr "alternative" "2,3")
+     (cond [(eq_attr "alternative" "3,4,5")
 	      (cond [(match_test "<MODE>mode == V2HFmode
 				 || <MODE>mode == V2BFmode")
 		       (const_string "V4SF")
@@ -392,7 +445,7 @@ (define_insn "*mov<mode>_internal"
 		    ]
 		    (const_string "TI"))
 
-	    (and (eq_attr "alternative" "4,5")
+	    (and (eq_attr "alternative" "6,7")
 		 (ior (match_test "<MODE>mode == V2HFmode
 				 || <MODE>mode == V2BFmode")
 		      (not (match_test "TARGET_SSE2"))))
@@ -400,13 +453,32 @@ (define_insn "*mov<mode>_internal"
 	   ]
 	   (const_string "SI")))
    (set (attr "preferred_for_speed")
-     (cond [(eq_attr "alternative" "6")
+     (cond [(eq_attr "alternative" "8")
 	      (symbol_ref "TARGET_INTER_UNIT_MOVES_FROM_VEC")
-	    (eq_attr "alternative" "7")
+	    (eq_attr "alternative" "9")
 	      (symbol_ref "TARGET_INTER_UNIT_MOVES_TO_VEC")
 	   ]
 	   (symbol_ref "true")))])
 
+;; Split
+;;
+;; (set (reg:V4QI 100)
+;;      (const_vector:V4QI [
+;;        (const_int -1 [0xffffffffffffffff]) repeated x4]))
+;;
+;; by loading from memory if it hasn't been eliminated to make top bits
+;; cleared in vector register.
+(define_split
+  [(set (match_operand:V_32 0 "register_operand")
+	(match_operand:V_32 1 "int_float_vector_all_ones_operand"))]
+  "TARGET_SSE && SSE_REG_P (operands[0])"
+  [(const_int 0)]
+{
+  operands[1] = validize_mem (force_const_mem (<MODE>mode,
+                                               operands[1]));
+  emit_move_insn (operands[0], operands[1]);
+})
+
 ;; 16-bit, 32-bit and 64-bit constant vector stores.  After reload,
 ;; convert them to immediate integer stores.
 (define_insn_and_split "*mov<mode>_imm"
diff --git a/gcc/testsuite/gcc.target/i386/pr117839-1b.c b/gcc/testsuite/gcc.target/i386/pr117839-1b.c
index e71b991a207..6b181f35dff 100644
--- a/gcc/testsuite/gcc.target/i386/pr117839-1b.c
+++ b/gcc/testsuite/gcc.target/i386/pr117839-1b.c
@@ -1,5 +1,8 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -march=x86-64-v3" } */
-/* { dg-final { scan-assembler-times "xor\[a-z\]*\[\t \]*%xmm\[0-9\]\+,\[^,\]*" 1 } } */
+/* { dg-final { scan-assembler-times "xor\[\t \]+%xmm\[0-9\]+, \[^,\]+" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "xor\[\t \]+%xmm\[0-9\]+, \[^,\]+" 2 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[\t \]+\\\$0, " 3 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movq\[\t \]+%xmm\[0-9\]+, \[^,\]+" 1 { target ia32 } } } */
 
 #include "pr117839-1a.c"
diff --git a/gcc/testsuite/gcc.target/i386/pr117839-2.c b/gcc/testsuite/gcc.target/i386/pr117839-2.c
index c76744cf98b..b00d8eaec5c 100644
--- a/gcc/testsuite/gcc.target/i386/pr117839-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr117839-2.c
@@ -1,6 +1,9 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -march=x86-64-v3" } */
-/* { dg-final { scan-assembler-times "xor\[a-z\]*\[\t \]*%xmm\[0-9\]\+,\[^,\]*" 1 } } */
+/* { dg-final { scan-assembler-times "xor\[\t \]+%xmm\[0-9\]+, \[^,\]+" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "xor\[\t \]+%xmm\[0-9\]+, \[^,\]+" 3 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[\t \]+\\\$0, " 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movq\[\t \]+%xmm\[0-9\]+, \[^,\]+" 2 { target ia32 } } } */
 
 #include <stddef.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-1.c b/gcc/testsuite/gcc.target/i386/pr121015-1.c
new file mode 100644
index 00000000000..fefa5185be4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-1.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64-v3" } */
+/* { dg-final { scan-assembler-not "\tmovl\[\\t \]+\\\$-1, %" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "\tmovq\[\\t \]+\\\$-1, " { target { ! ia32 } } } } */
+
+extern union {
+  int i;
+  float f;
+} int_as_float_u;
+
+extern int render_result_from_bake_w;
+extern int render_result_from_bake_h_seed_pass;
+extern float *render_result_from_bake_h_primitive;
+extern float *render_result_from_bake_h_seed;
+
+float
+int_as_float(int i)
+{
+  int_as_float_u.i = i;
+  return int_as_float_u.f;
+}
+
+void
+render_result_from_bake_h(int tx)
+{
+  while (render_result_from_bake_w) {
+    for (; tx < render_result_from_bake_w; tx++)
+      render_result_from_bake_h_primitive[1] =
+          render_result_from_bake_h_primitive[2] = int_as_float(-1);
+    if (render_result_from_bake_h_seed_pass) {
+      *render_result_from_bake_h_seed = 0;
+    }
+  }
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-10a.c b/gcc/testsuite/gcc.target/i386/pr121015-10a.c
new file mode 100644
index 00000000000..67b574cc837
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-10a.c
@@ -0,0 +1,32 @@
+/* { dg-do compile { target fpic } } */
+/* { dg-options "-O2 -march=x86-64 -fpic" } */
+/* Keep labels and directives ('.cfi_startproc', '.cfi_endproc').  */
+/* { dg-final { check-function-bodies "**" "" "" { target { ! ia32 } } {^\t?\.}  } } */
+
+/*
+**__bid64_to_binary80:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**	mov(l|q)	__bid64_to_binary80_x_out@GOTPCREL\(%rip\), %(r|e)ax
+**	movq	\$-1, \(%(r|e)ax\)
+**	ret
+**...
+*/
+
+typedef struct {
+  struct {
+    unsigned short lo4;
+    unsigned short lo3;
+    unsigned short lo2;
+    unsigned short lo1;
+  } i;
+} BID_BINARY80LDOUBLE;
+extern BID_BINARY80LDOUBLE __bid64_to_binary80_x_out;
+void
+__bid64_to_binary80 (void)
+{
+  __bid64_to_binary80_x_out.i.lo4
+    = __bid64_to_binary80_x_out.i.lo3
+    = __bid64_to_binary80_x_out.i.lo2
+    = __bid64_to_binary80_x_out.i.lo1 = 65535;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-10b.c b/gcc/testsuite/gcc.target/i386/pr121015-10b.c
new file mode 100644
index 00000000000..06cb58f702d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-10b.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { fpic && lp64 } } } */
+/* { dg-options "-O2 -march=x86-64 -fno-pic -mcmodel=large" } */
+
+/*
+**__bid64_to_binary80:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**	movabsq	\$.LC0, %rax
+**	movq	\(%rax\), %rdx
+**	movabsq	\$__bid64_to_binary80_x_out, %rax
+**	movq	%rdx, \(%rax\)
+**	ret
+**...
+*/
+
+#include "pr121015-10a.c"
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-10c.c b/gcc/testsuite/gcc.target/i386/pr121015-10c.c
new file mode 100644
index 00000000000..573a1562883
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-10c.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target { fpic && lp64 } } } */
+/* { dg-options "-O2 -march=x86-64 -fpic -mcmodel=large" } */
+
+/*
+**__bid64_to_binary80:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**.L2:
+**	leaq	.L2\(%rip\), %rax
+**	movabsq	\$_GLOBAL_OFFSET_TABLE_-.L2, %r11
+**	movabsq	\$__bid64_to_binary80_x_out@GOT, %rdx
+**	movabsq	\$.LC0@GOTOFF, %rcx
+**	addq	%r11, %rax
+**	movq	\(%rax,%rdx\), %rdx
+**	movq	\(%rax,%rcx\), %rax
+**	movq	%rax, \(%rdx\)
+**	ret
+**...
+*/
+
+#include "pr121015-10a.c"
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-11a.c b/gcc/testsuite/gcc.target/i386/pr121015-11a.c
new file mode 100644
index 00000000000..5aafb2806b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-11a.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target fpic } } */
+/* { dg-options "-O2 -march=x86-64 -fpic" } */
+/* Keep labels and directives ('.cfi_startproc', '.cfi_endproc').  */
+/* { dg-final { check-function-bodies "**" "" "" { target { ! ia32 } } {^\t?\.}  } } */
+
+/*
+**foo:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**	movd	.LC0\(%rip\), %xmm0
+**...
+*/
+
+typedef char __v4qi __attribute__ ((__vector_size__ (4)));
+
+void
+foo (void)
+{
+  register __v4qi x asm ("xmm0") = __extension__(__v4qi){-1, -1, -1, -1};
+  asm ("reg %0" : : "v" (x));
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-11b.c b/gcc/testsuite/gcc.target/i386/pr121015-11b.c
new file mode 100644
index 00000000000..9ff2908829b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-11b.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { fpic && lp64 } } } */
+/* { dg-options "-O2 -march=x86-64 -fno-pic -mcmodel=large" } */
+
+/*
+**foo:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**	movabsq	\$.LC0, %rax
+**	movd	\(%rax\), %xmm0
+**...
+*/
+
+#include "pr121015-11a.c"
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-11c.c b/gcc/testsuite/gcc.target/i386/pr121015-11c.c
new file mode 100644
index 00000000000..f0e6ccb2b92
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-11c.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { fpic && lp64 } } } */
+/* { dg-options "-O2 -march=x86-64 -fpic -mcmodel=large" } */
+
+/*
+**foo:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**.L2:
+**	movabsq	\$_GLOBAL_OFFSET_TABLE_-.L2, %r11
+**	leaq	.L2\(%rip\), %rax
+**	movabsq	\$.LC0@GOTOFF, %rdx
+**	addq	%r11, %rax
+**	movd	\(%rax,%rdx\), %xmm0
+**...
+*/
+
+#include "pr121015-11a.c"
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-2a.c b/gcc/testsuite/gcc.target/i386/pr121015-2a.c
new file mode 100644
index 00000000000..e8840b0ffd6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-2a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64" } */
+
+void
+foo (int *c1, int *c2)
+{
+  if (c1)
+    {
+      c1 = __builtin_assume_aligned (c1, 16);
+      c1[0] = 0;
+      c1[1] = 0;
+    }
+  if (c2)
+    {
+      c2 = __builtin_assume_aligned (c2, 16);
+      c2[0] = 0;
+      c2[1] = 0;
+    }
+}
+
+/* { dg-final { scan-assembler-times "pxor\[ \\t\]+\[^\n\]*%xmm" 2 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movq\[ \\t\]+\[^\n\]*%xmm" 2 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movq\[ \\t\]+\\\$0," 2 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xmm" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-2b.c b/gcc/testsuite/gcc.target/i386/pr121015-2b.c
new file mode 100644
index 00000000000..9df2766c612
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-2b.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2 -mno-sse" } */
+
+#include "pr121015-2a.c"
+
+/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$0," 4 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-3.c b/gcc/testsuite/gcc.target/i386/pr121015-3.c
new file mode 100644
index 00000000000..44bf63c73e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-3.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64" } */
+
+typedef enum { CPP_NUMBER } cpp_ttype;
+typedef struct {
+  bool unsignedp;
+  bool overflow;
+} cpp_num;
+extern cpp_num value, __trans_tmp_1;
+extern cpp_ttype eval_token_token_0;
+extern int eval_token_temp;
+static cpp_num
+eval_token(void)
+{
+  cpp_num __trans_tmp_2, result;
+  result.overflow = false;
+  switch (eval_token_token_0)
+    {
+    case CPP_NUMBER:
+      switch (eval_token_temp)
+	{
+	case 1:
+	  return __trans_tmp_1;
+	}
+      result.unsignedp = false;
+      __trans_tmp_2 = result;
+      return __trans_tmp_2;
+    }
+  return result;
+}
+void 
+_cpp_parse_expr_pfile(void)
+{
+  value = eval_token();
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-4.c b/gcc/testsuite/gcc.target/i386/pr121015-4.c
new file mode 100644
index 00000000000..2848a946dd1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-4.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64" } */
+/* Keep labels and directives ('.cfi_startproc', '.cfi_endproc').  */
+/* { dg-final { check-function-bodies "**" "" "" { target { ! ia32 } } {^\t?\.}  } } */
+
+/*
+**zero:
+**.LFB0:
+**	.cfi_startproc
+**	xorps	%xmm0, %xmm0
+**	ret
+**...
+*/
+
+typedef float __v2sf __attribute__ ((__vector_size__ (8)));
+extern __v2sf f1;
+
+__v2sf
+zero (void)
+{
+  return __extension__(__v2sf){0, 0};
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-5a.c b/gcc/testsuite/gcc.target/i386/pr121015-5a.c
new file mode 100644
index 00000000000..605a87db1fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-5a.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64" } */
+/* Keep labels and directives ('.cfi_startproc', '.cfi_endproc').  */
+/* { dg-final { check-function-bodies "**" "" "" { target { ! ia32 } } {^\t?\.}  } } */
+
+/*
+**m1:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**	movq	.LC[0-9]+\(%rip\), %xmm0
+**	ret
+**...
+*/
+
+typedef char __v8qi __attribute__ ((__vector_size__ (8)));
+
+__v8qi
+m1 (void)
+{
+  return __extension__(__v8qi){-1, -1, -1, -1, -1, -1, -1, -1};
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-5b.c b/gcc/testsuite/gcc.target/i386/pr121015-5b.c
new file mode 100644
index 00000000000..22d51fd33ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-5b.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { fpic && lp64 } } } */
+/* { dg-options "-O2 -march=x86-64 -fno-pic -mcmodel=large" } */
+/* Keep labels and directives ('.cfi_startproc', '.cfi_endproc').  */
+/* { dg-final { check-function-bodies "**" "" "" { target "*-*-*" } {^\t?\.}  } } */
+
+/*
+**m1:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**	movabsq	\$.LC0, %rax
+**	movq	\(%rax\), %xmm0
+**	ret
+**...
+*/
+
+#include "pr121015-5a.c"
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-5c.c b/gcc/testsuite/gcc.target/i386/pr121015-5c.c
new file mode 100644
index 00000000000..bb210fa71ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-5c.c
@@ -0,0 +1,20 @@
+/* { dg-do compile { target { fpic && lp64 } } } */
+/* { dg-options "-O2 -march=x86-64 -fpic -mcmodel=large" } */
+/* Keep labels and directives ('.cfi_startproc', '.cfi_endproc').  */
+/* { dg-final { check-function-bodies "**" "" "" { target "*-*-*" } {^\t?\.}  } } */
+
+/*
+**m1:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**.L2:
+**	movabsq	\$_GLOBAL_OFFSET_TABLE_-.L2, %r11
+**	leaq	.L2\(%rip\), %rax
+**	movabsq	\$.LC0@GOTOFF, %rdx
+**	addq	%r11, %rax
+**	movq	\(%rax,%rdx\), %xmm0
+**	ret
+**...
+*/
+
+#include "pr121015-5a.c"
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-6.c b/gcc/testsuite/gcc.target/i386/pr121015-6.c
new file mode 100644
index 00000000000..daebcb0acc5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-6.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64" } */
+/* Keep labels and directives ('.cfi_startproc', '.cfi_endproc').  */
+/* { dg-final { check-function-bodies "**" "" "" { target { ! ia32 } } {^\t?\.}  } } */
+
+/*
+**m1:
+**.LFB[0-9]+:
+**	.cfi_startproc
+**	pcmpeqd	%xmm0, %xmm0
+**	ret
+**...
+*/
+
+#include <x86intrin.h>
+
+__m128i
+m1 (void)
+{
+  __m64 x = _mm_set1_pi8 (-1);
+  __m128i y = _mm_set1_epi64 (x);
+  return y;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-7a.c b/gcc/testsuite/gcc.target/i386/pr121015-7a.c
new file mode 100644
index 00000000000..94037e33d81
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-7a.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64" } */
+
+void
+foo (int *c1, int *c2)
+{
+  if (c1)
+    {
+      c1 = __builtin_assume_aligned (c1, 16);
+      c1[0] = -1;
+      c1[1] = -1;
+    }
+  if (c2)
+    {
+      c2 = __builtin_assume_aligned (c2, 16);
+      c2[0] = -1;
+      c2[1] = -1;
+    }
+}
+
+/* { dg-final { scan-assembler-times "movq\[ \\t\]+\[^\n\]*%xmm" 4 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movq\[ \\t\]+\\\$-1," 2 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xmm" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-7b.c b/gcc/testsuite/gcc.target/i386/pr121015-7b.c
new file mode 100644
index 00000000000..3784ce0dfed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-7b.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2 -mno-sse" } */
+
+#include "pr121015-7a.c"
+
+/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$-1," 4 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-7c.c b/gcc/testsuite/gcc.target/i386/pr121015-7c.c
new file mode 100644
index 00000000000..33b2df3ac9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-7c.c
@@ -0,0 +1,8 @@
+/* { dg-do compile { target fpic } } */
+/* { dg-options "-O2 -march=x86-64 -fpic" } */
+
+#include "pr121015-7a.c"
+
+/* { dg-final { scan-assembler-times "movq\[ \\t\]+\[^\n\]*%xmm" 4 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movq\[ \\t\]+\\\$-1," 2 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xmm" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-8.c b/gcc/testsuite/gcc.target/i386/pr121015-8.c
new file mode 100644
index 00000000000..de2db2a2b0d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-8.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-Og -fno-dce -mtune=generic" } */
+
+typedef int __attribute__((__vector_size__ (4))) S;
+extern int bar (S);
+
+int
+foo ()
+{
+  return bar ((S){-1});
+}
+
+/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$-1, \\(%esp\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$-1, %edi" 1 { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121015-9.c b/gcc/testsuite/gcc.target/i386/pr121015-9.c
new file mode 100644
index 00000000000..05c2021ba05
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015-9.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-Og -fno-dce -mtune=generic" } */
+
+typedef int __attribute__((__vector_size__ (4))) S;
+extern int bar (S);
+
+int
+foo ()
+{
+  return bar ((S){0});
+}
+
+/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$0, \\(%esp\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$0, %edi" 1 { target { ! ia32 } } } } */
-- 
2.50.1

Reply via email to