[PATCH v5] x86: Check all 0s/1s vectors with standard_sse_constant_

H.J. Lu Mon, 14 Jul 2025 02:25:23 -0700

On Mon, Jul 14, 2025 at 4:06 PM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Mon, Jul 14, 2025 at 9:37 AM H.J. Lu <hjl.to...@gmail.com> wrote:
> >
> > On Mon, Jul 14, 2025 at 3:11 PM Uros Bizjak <ubiz...@gmail.com> wrote:
> > >
> > > On Mon, Jul 14, 2025 at 5:32 AM Uros Bizjak <ubiz...@gmail.com> wrote:
> > > >
> > > > On Mon, Jul 14, 2025 at 2:14 AM H.J. Lu <hjl.to...@gmail.com> wrote:
> > > > >
> > > > > On Sat, Jul 12, 2025 at 7:51 PM Uros Bizjak <ubiz...@gmail.com> wrote:
> > > > > >
> > > > > > On Sat, Jul 12, 2025 at 1:41 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> > > > > > >
> > > > > > > On Sat, Jul 12, 2025 at 5:58 PM Uros Bizjak <ubiz...@gmail.com> 
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Sat, Jul 12, 2025 at 11:52 AM H.J. Lu <hjl.to...@gmail.com> 
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Sat, Jul 12, 2025 at 5:03 PM Uros Bizjak 
> > > > > > > > > <ubiz...@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, Jul 11, 2025 at 6:05 AM H.J. Lu 
> > > > > > > > > > <hjl.to...@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > commit 77473a27bae04da99d6979d43e7bd0a8106f4557
> > > > > > > > > > > Author: H.J. Lu <hjl.to...@gmail.com>
> > > > > > > > > > > Date:   Thu Jun 26 06:08:51 2025 +0800
> > > > > > > > > > >
> > > > > > > > > > >     x86: Also handle all 1s float vector constant
> > > > > > > > > > >
> > > > > > > > > > > replaces
> > > > > > > > > > >
> > > > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > > > >         (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 
> > > > > > > > > > > 0x2]) [0  S8 A64])) 2031
> > > > > > > > > > >  {*movv2sf_internal}
> > > > > > > > > > >      (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > > > >                 (const_double:SF -QNaN [-QNaN]) repeated 
> > > > > > > > > > > x2
> > > > > > > > > > >             ])
> > > > > > > > > > >         (nil)))
> > > > > > > > > > >
> > > > > > > > > > > with
> > > > > > > > > > >
> > > > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > > > >         (const_vector:V8QI [
> > > > > > > > > > >                 (const_int -1 [0xffffffffffffffff]) 
> > > > > > > > > > > repeated x8
> > > > > > > > > > >             ])) -1
> > > > > > > > > > >      (nil))
> > > > > > > > > > > ...
> > > > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > > > >         (subreg:V2SF (reg:V8QI 112) 0)) 2031 
> > > > > > > > > > > {*movv2sf_internal}
> > > > > > > > > > >      (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > > > >                 (const_double:SF -QNaN [-QNaN]) repeated 
> > > > > > > > > > > x2
> > > > > > > > > > >             ])
> > > > > > > > > > >         (nil)))
> > > > > > > > > > >
> > > > > > > > > > > which leads to
> > > > > > > > > > >
> > > > > > > > > > > pr121015.c: In function ‘render_result_from_bake_h’:
> > > > > > > > > > > pr121015.c:34:1: error: unrecognizable insn:
> > > > > > > > > > >    34 | }
> > > > > > > > > > >       | ^
> > > > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > > > >         (const_vector:V8QI [
> > > > > > > > > > >                 (const_int -1 [0xffffffffffffffff]) 
> > > > > > > > > > > repeated x8
> > > > > > > > > > >             ])) -1
> > > > > > > > > > >      (expr_list:REG_EQUIV (const_vector:V8QI [
> > > > > > > > > > >                 (const_int -1 [0xffffffffffffffff]) 
> > > > > > > > > > > repeated x8
> > > > > > > > > > >             ])
> > > > > > > > > > >         (nil)))
> > > > > > > > > > > during RTL pass: ira
> > > > > > > > > > >
> > > > > > > > > > > 1. Add vector_const0_or_m1_operand for vector 0 or 
> > > > > > > > > > > integer vector -1.
> > > > > > > > > > > 2. Add nonimm_or_vector_const0_or_m1_operand for 
> > > > > > > > > > > nonimmediate, vector 0
> > > > > > > > > > > or integer vector -1 operand.
> > > > > > > > > > > 3. Add BX constraint for MMX vector constant all 0s/1s 
> > > > > > > > > > > operand.
> > > > > > > > > > > 4. Update MMXMODE:*mov<mode>_internal to support integer 
> > > > > > > > > > > all 1s vectors.
> > > > > > > > > > > Replace <v,C> with <v,BX> to generate
> > > > > > > > > > >
> > > > > > > > > > > pcmpeqd %xmm0, %xmm0
> > > > > > > > > > >
> > > > > > > > > > > for
> > > > > > > > > > >
> > > > > > > > > > > (set (reg/i:V8QI 20 xmm0)
> > > > > > > > > > >      (const_vector:V8QI [(const_int -1 
> > > > > > > > > > > [0xffffffffffffffff]) repeated x8]))
> > > > > > > > > > >
> > > > > > > > > > > NB: The upper 64 bits in XMM0 are all 1s, instead of all 
> > > > > > > > > > > 0s.
> > > > > > > > > >
> > > > > > > > > > Actually, we don't want this, we should keep the top 64 
> > > > > > > > > > bits zero,
> > > > > > > > > > especially for floating point, where the pattern represents 
> > > > > > > > > > NaN.
> > > > > > > > > >
> > > > > > > > > > So, I think the correct way is to avoid the transformation 
> > > > > > > > > > for
> > > > > > > > > > narrower modes in the first place.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > How does your latest patch handle this?
> > > > > > > > >
> > > > > > > > > typedef char __v8qi __attribute__ ((__vector_size__ (8)));
> > > > > > > > >
> > > > > > > > > __v8qi
> > > > > > > > > m1 (void)
> > > > > > > > > {
> > > > > > > > >   return __extension__(__v8qi){-1, -1, -1, -1, -1, -1, -1, 
> > > > > > > > > -1};
> > > > > > > > > }
> > > > > > > >
> > > > > > > > No, my patch is also not appropriate, because it also introduces
> > > > > > > > "pcmpeq %xmm, %xmm". We should not generate 8-byte all-ones 
> > > > > > > > load using
> > > > > > > > pcmpeq, because upper 64 bits are also all 1s.
> > > > > > > >
> > > > > > > > The correct way is to avoid generating 64 bit all-ones, because 
> > > > > > > > this
> > > > > > > > constant is not supported and   standard_sse_constant_p () 
> > > > > > > > correctly
> > > > > > > > reports this.
> > > > > > >
> > > > > > > We can generate
> > > > > > >
> > > > > > > pcmpeqd %xmm0, %xmm0
> > > > > > > movq %xmm0, %xmm0
> > > > > > >
> > > > > > > for V8QI and
> > > > > > >
> > > > > > > pcmpeqd %xmm0, %xmm0
> > > > > > > movd %xmm0, %xmm0
> > > > > > >
> > > > > > > for V4QI.
> > > > > >
> > > > > > I don't think this is better than skipping the transformation for
> > > > > > instructions that we in fact emulate altogether. While loading
> > > > > > all-zero is OK in any mode, loading all-one is not OK for narrow
> > > > > > modes. So, this transformation should simply be skipped for all-one 
> > > > > > in
> > > > > > narrow modes.
> > > > >
> > > > > Here is the v3 patch, which allows 4-byte/8-byte all 1s in mmx.md
> > > > > and split to load from memory if the destination is an XMM register.
> > > >
> > > > Why don't we just skip the generation of narrow-mode all-ones vector
> > > > constants in the new pass altogether? It is not worth complicating
> > > > move patterns for a very seldom used feature and for very small (if at
> > > > all) gain.
> > > >
> > > > Please just change the pass to not generate vetro all-ones in 64bit or
> > > > narrower modes.
> > >
> > > I'm not familiar with the pass, but IMO the attached patch should be a
> > > good starting point. We don't want to CSE narrow all-ones with their
> > > wide counterparts, because we want zeros in top bytes of the narrow
> > > all-ones operands.
> > >
> > > Uros.
> >
> > I am testing this.
>
> +      /* Skip if vector size is less than 16 bytes since all 1s SSE
> +     constants must be at leas 16 bytes.  */
> +      if (GET_MODE_SIZE (mode) < 16)
> +    return nullptr;
>
> This is functionally exactly the same as my proposed part that uses
> standard_sse_constant_p. I think using the predicate makes the
> decision more robust and documents what we really want to do.
>
>  +(define_split
> +  [(set (match_operand:MMXMODE 0 "register_operand")
> +    (match_operand:MMXMODE 1 "memory_operand"))]
> +  "TARGET_64BIT && reload_completed && GENERAL_REG_P (operands[0])"
> +  [(const_int 0)]
>
> IMO, the above is a good optimization, but please leave it for a
> follow-up patch. I think, the above part should be similar to:
>
> ;; 16-bit, 32-bit and 64-bit constant vector stores.  After reload,
> ;; convert them to immediate integer stores.
> (define_insn_and_split "*mov<mode>_imm"
>   [(set (match_operand:V_16_32_64 0 "memory_operand" "=m")
>     (match_operand:V_16_32_64 1 "x86_64_const_vector_operand" "i"))]
>
> involving ix86_convert_const_vector_to_integer, because we are not
> limited to all-ones here.
>
> Thanks,
> Uros.


Like this?

commit 77473a27bae04da99d6979d43e7bd0a8106f4557
Author: H.J. Lu <hjl.to...@gmail.com>
Date:   Thu Jun 26 06:08:51 2025 +0800

    x86: Also handle all 1s float vector constant

replaces

(insn 29 28 30 5 (set (reg:V2SF 107)
        (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S8 A64])) 2031
 {*movv2sf_internal}
     (expr_list:REG_EQUAL (const_vector:V2SF [
                (const_double:SF -QNaN [-QNaN]) repeated x2
            ])
        (nil)))

with

(insn 98 13 14 3 (set (reg:V8QI 112)
        (const_vector:V8QI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])) -1
     (nil))
...
(insn 29 28 30 5 (set (reg:V2SF 107)
        (subreg:V2SF (reg:V8QI 112) 0)) 2031 {*movv2sf_internal}
     (expr_list:REG_EQUAL (const_vector:V2SF [
                (const_double:SF -QNaN [-QNaN]) repeated x2
            ])
        (nil)))

which leads to

pr121015.c: In function ‘render_result_from_bake_h’:
pr121015.c:34:1: error: unrecognizable insn:
   34 | }
      | ^
(insn 98 13 14 3 (set (reg:V8QI 112)
        (const_vector:V8QI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])) -1
     (expr_list:REG_EQUIV (const_vector:V8QI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])
        (nil)))
during RTL pass: ira

Check all 0s/1s vectors with standard_sse_constant_p to avoid unsupported
all 1s vectors.

gcc/

PR target/121015
* config/i386/i386.cc (ix86_broadcast_inner): Check all 0s/1s
vectors with standard_sse_constant_p.

gcc/testsuite/

PR target/121015
* gcc.target/i386/pr121015.c: New test.

Co-Developed-by: H.J. Lu <hjl.to...@gmail.com>

-- 
H.J.

From 9c155cf3af65ce02e1494965e3cc7be609c2555c Mon Sep 17 00:00:00 2001
From: Uros Bizjak <ubiz...@gmail.com>
Date: Mon, 14 Jul 2025 17:16:36 +0800
Subject: [PATCH v5] x86: Check all 0s/1s vectors with standard_sse_constant_p
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

commit 77473a27bae04da99d6979d43e7bd0a8106f4557
Author: H.J. Lu <hjl.to...@gmail.com>
Date:   Thu Jun 26 06:08:51 2025 +0800

    x86: Also handle all 1s float vector constant

replaces

(insn 29 28 30 5 (set (reg:V2SF 107)
        (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S8 A64])) 2031 {*movv2sf_internal}
     (expr_list:REG_EQUAL (const_vector:V2SF [
                (const_double:SF -QNaN [-QNaN]) repeated x2
            ])
        (nil)))

with

(insn 98 13 14 3 (set (reg:V8QI 112)
        (const_vector:V8QI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])) -1
     (nil))
...
(insn 29 28 30 5 (set (reg:V2SF 107)
        (subreg:V2SF (reg:V8QI 112) 0)) 2031 {*movv2sf_internal}
     (expr_list:REG_EQUAL (const_vector:V2SF [
                (const_double:SF -QNaN [-QNaN]) repeated x2
            ])
        (nil)))

which leads to

pr121015.c: In function ‘render_result_from_bake_h’:
pr121015.c:34:1: error: unrecognizable insn:
   34 | }
      | ^
(insn 98 13 14 3 (set (reg:V8QI 112)
        (const_vector:V8QI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])) -1
     (expr_list:REG_EQUIV (const_vector:V8QI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])
        (nil)))
during RTL pass: ira

Check all 0s/1s vectors with standard_sse_constant_p to avoid unsupported
all 1s vectors.

gcc/

	PR target/121015
	* config/i386/i386.cc (ix86_broadcast_inner): Check all 0s/1s
	vectors with standard_sse_constant_p.

gcc/testsuite/

	PR target/121015
	* gcc.target/i386/pr121015.c: New test.

Co-Developed-by: H.J. Lu <hjl.to...@gmail.com>
---
 gcc/config/i386/i386-features.cc         | 12 ++++-----
 gcc/testsuite/gcc.target/i386/pr121015.c | 32 ++++++++++++++++++++++++
 2 files changed, 37 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121015.c

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 054f8d5ddc8..734ab70c108 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -3534,22 +3534,20 @@ ix86_broadcast_inner (rtx op, machine_mode mode,
 		      machine_mode *scalar_mode_p,
 		      x86_cse_kind *kind_p, rtx_insn **insn_p)
 {
-  if (op == const0_rtx || op == CONST0_RTX (mode))
+  switch (standard_sse_constant_p (op, mode))
     {
+    case 1:
       *scalar_mode_p = QImode;
       *kind_p = X86_CSE_CONST0_VECTOR;
       *insn_p = nullptr;
       return const0_rtx;
-    }
-  else if ((GET_MODE_CLASS (mode) == MODE_VECTOR_INT
-	    && (op == constm1_rtx || op == CONSTM1_RTX (mode)))
-	    || (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT
-		&& float_vector_all_ones_operand (op, mode)))
-    {
+    case 2:
       *scalar_mode_p = QImode;
       *kind_p = X86_CSE_CONSTM1_VECTOR;
       *insn_p = nullptr;
       return constm1_rtx;
+    default:
+      break;
     }
 
   mode = GET_MODE (op);
diff --git a/gcc/testsuite/gcc.target/i386/pr121015.c b/gcc/testsuite/gcc.target/i386/pr121015.c
new file mode 100644
index 00000000000..57c8bff14ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121015.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64-v3" } */
+
+extern union {
+  int i;
+  float f;
+} int_as_float_u;
+
+extern int render_result_from_bake_w;
+extern int render_result_from_bake_h_seed_pass;
+extern float *render_result_from_bake_h_primitive;
+extern float *render_result_from_bake_h_seed;
+
+float
+int_as_float(int i)
+{
+  int_as_float_u.i = i;
+  return int_as_float_u.f;
+}
+
+void
+render_result_from_bake_h(int tx)
+{
+  while (render_result_from_bake_w) {
+    for (; tx < render_result_from_bake_w; tx++)
+      render_result_from_bake_h_primitive[1] =
+          render_result_from_bake_h_primitive[2] = int_as_float(-1);
+    if (render_result_from_bake_h_seed_pass) {
+      *render_result_from_bake_h_seed = 0;
+    }
+  }
+}
-- 
2.50.1

[PATCH v5] x86: Check all 0s/1s vectors with standard_sse_constant_

Reply via email to