On Mon, Jul 14, 2025 at 4:06 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Mon, Jul 14, 2025 at 9:37 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > On Mon, Jul 14, 2025 at 3:11 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > > > On Mon, Jul 14, 2025 at 5:32 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > > > > > On Mon, Jul 14, 2025 at 2:14 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > > > > > On Sat, Jul 12, 2025 at 7:51 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > > > > > > > > > On Sat, Jul 12, 2025 at 1:41 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > > > > > > > > > On Sat, Jul 12, 2025 at 5:58 PM Uros Bizjak <ubiz...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > On Sat, Jul 12, 2025 at 11:52 AM H.J. Lu <hjl.to...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Sat, Jul 12, 2025 at 5:03 PM Uros Bizjak > > > > > > > > > <ubiz...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > On Fri, Jul 11, 2025 at 6:05 AM H.J. Lu > > > > > > > > > > <hjl.to...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > commit 77473a27bae04da99d6979d43e7bd0a8106f4557 > > > > > > > > > > > Author: H.J. Lu <hjl.to...@gmail.com> > > > > > > > > > > > Date: Thu Jun 26 06:08:51 2025 +0800 > > > > > > > > > > > > > > > > > > > > > > x86: Also handle all 1s float vector constant > > > > > > > > > > > > > > > > > > > > > > replaces > > > > > > > > > > > > > > > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107) > > > > > > > > > > > (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags > > > > > > > > > > > 0x2]) [0 S8 A64])) 2031 > > > > > > > > > > > {*movv2sf_internal} > > > > > > > > > > > (expr_list:REG_EQUAL (const_vector:V2SF [ > > > > > > > > > > > (const_double:SF -QNaN [-QNaN]) repeated > > > > > > > > > > > x2 > > > > > > > > > > > ]) > > > > > > > > > > > (nil))) > > > > > > > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112) > > > > > > > > > > > (const_vector:V8QI [ > > > > > > > > > > > (const_int -1 [0xffffffffffffffff]) > > > > > > > > > > > repeated x8 > > > > > > > > > > > ])) -1 > > > > > > > > > > > (nil)) > > > > > > > > > > > ... > > > > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107) > > > > > > > > > > > (subreg:V2SF (reg:V8QI 112) 0)) 2031 > > > > > > > > > > > {*movv2sf_internal} > > > > > > > > > > > (expr_list:REG_EQUAL (const_vector:V2SF [ > > > > > > > > > > > (const_double:SF -QNaN [-QNaN]) repeated > > > > > > > > > > > x2 > > > > > > > > > > > ]) > > > > > > > > > > > (nil))) > > > > > > > > > > > > > > > > > > > > > > which leads to > > > > > > > > > > > > > > > > > > > > > > pr121015.c: In function ‘render_result_from_bake_h’: > > > > > > > > > > > pr121015.c:34:1: error: unrecognizable insn: > > > > > > > > > > > 34 | } > > > > > > > > > > > | ^ > > > > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112) > > > > > > > > > > > (const_vector:V8QI [ > > > > > > > > > > > (const_int -1 [0xffffffffffffffff]) > > > > > > > > > > > repeated x8 > > > > > > > > > > > ])) -1 > > > > > > > > > > > (expr_list:REG_EQUIV (const_vector:V8QI [ > > > > > > > > > > > (const_int -1 [0xffffffffffffffff]) > > > > > > > > > > > repeated x8 > > > > > > > > > > > ]) > > > > > > > > > > > (nil))) > > > > > > > > > > > during RTL pass: ira > > > > > > > > > > > > > > > > > > > > > > 1. Add vector_const0_or_m1_operand for vector 0 or > > > > > > > > > > > integer vector -1. > > > > > > > > > > > 2. Add nonimm_or_vector_const0_or_m1_operand for > > > > > > > > > > > nonimmediate, vector 0 > > > > > > > > > > > or integer vector -1 operand. > > > > > > > > > > > 3. Add BX constraint for MMX vector constant all 0s/1s > > > > > > > > > > > operand. > > > > > > > > > > > 4. Update MMXMODE:*mov<mode>_internal to support integer > > > > > > > > > > > all 1s vectors. > > > > > > > > > > > Replace <v,C> with <v,BX> to generate > > > > > > > > > > > > > > > > > > > > > > pcmpeqd %xmm0, %xmm0 > > > > > > > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > > > > > (set (reg/i:V8QI 20 xmm0) > > > > > > > > > > > (const_vector:V8QI [(const_int -1 > > > > > > > > > > > [0xffffffffffffffff]) repeated x8])) > > > > > > > > > > > > > > > > > > > > > > NB: The upper 64 bits in XMM0 are all 1s, instead of all > > > > > > > > > > > 0s. > > > > > > > > > > > > > > > > > > > > Actually, we don't want this, we should keep the top 64 > > > > > > > > > > bits zero, > > > > > > > > > > especially for floating point, where the pattern represents > > > > > > > > > > NaN. > > > > > > > > > > > > > > > > > > > > So, I think the correct way is to avoid the transformation > > > > > > > > > > for > > > > > > > > > > narrower modes in the first place. > > > > > > > > > > > > > > > > > > > > > > > > > > > > How does your latest patch handle this? > > > > > > > > > > > > > > > > > > typedef char __v8qi __attribute__ ((__vector_size__ (8))); > > > > > > > > > > > > > > > > > > __v8qi > > > > > > > > > m1 (void) > > > > > > > > > { > > > > > > > > > return __extension__(__v8qi){-1, -1, -1, -1, -1, -1, -1, > > > > > > > > > -1}; > > > > > > > > > } > > > > > > > > > > > > > > > > No, my patch is also not appropriate, because it also introduces > > > > > > > > "pcmpeq %xmm, %xmm". We should not generate 8-byte all-ones > > > > > > > > load using > > > > > > > > pcmpeq, because upper 64 bits are also all 1s. > > > > > > > > > > > > > > > > The correct way is to avoid generating 64 bit all-ones, because > > > > > > > > this > > > > > > > > constant is not supported and standard_sse_constant_p () > > > > > > > > correctly > > > > > > > > reports this. > > > > > > > > > > > > > > We can generate > > > > > > > > > > > > > > pcmpeqd %xmm0, %xmm0 > > > > > > > movq %xmm0, %xmm0 > > > > > > > > > > > > > > for V8QI and > > > > > > > > > > > > > > pcmpeqd %xmm0, %xmm0 > > > > > > > movd %xmm0, %xmm0 > > > > > > > > > > > > > > for V4QI. > > > > > > > > > > > > I don't think this is better than skipping the transformation for > > > > > > instructions that we in fact emulate altogether. While loading > > > > > > all-zero is OK in any mode, loading all-one is not OK for narrow > > > > > > modes. So, this transformation should simply be skipped for all-one > > > > > > in > > > > > > narrow modes. > > > > > > > > > > Here is the v3 patch, which allows 4-byte/8-byte all 1s in mmx.md > > > > > and split to load from memory if the destination is an XMM register. > > > > > > > > Why don't we just skip the generation of narrow-mode all-ones vector > > > > constants in the new pass altogether? It is not worth complicating > > > > move patterns for a very seldom used feature and for very small (if at > > > > all) gain. > > > > > > > > Please just change the pass to not generate vetro all-ones in 64bit or > > > > narrower modes. > > > > > > I'm not familiar with the pass, but IMO the attached patch should be a > > > good starting point. We don't want to CSE narrow all-ones with their > > > wide counterparts, because we want zeros in top bytes of the narrow > > > all-ones operands. > > > > > > Uros. > > > > I am testing this. > > + /* Skip if vector size is less than 16 bytes since all 1s SSE > + constants must be at leas 16 bytes. */ > + if (GET_MODE_SIZE (mode) < 16) > + return nullptr; > > This is functionally exactly the same as my proposed part that uses > standard_sse_constant_p. I think using the predicate makes the > decision more robust and documents what we really want to do. > > +(define_split > + [(set (match_operand:MMXMODE 0 "register_operand") > + (match_operand:MMXMODE 1 "memory_operand"))] > + "TARGET_64BIT && reload_completed && GENERAL_REG_P (operands[0])" > + [(const_int 0)] > > IMO, the above is a good optimization, but please leave it for a > follow-up patch. I think, the above part should be similar to: > > ;; 16-bit, 32-bit and 64-bit constant vector stores. After reload, > ;; convert them to immediate integer stores. > (define_insn_and_split "*mov<mode>_imm" > [(set (match_operand:V_16_32_64 0 "memory_operand" "=m") > (match_operand:V_16_32_64 1 "x86_64_const_vector_operand" "i"))] > > involving ix86_convert_const_vector_to_integer, because we are not > limited to all-ones here. > > Thanks, > Uros.
Like this? commit 77473a27bae04da99d6979d43e7bd0a8106f4557 Author: H.J. Lu <hjl.to...@gmail.com> Date: Thu Jun 26 06:08:51 2025 +0800 x86: Also handle all 1s float vector constant replaces (insn 29 28 30 5 (set (reg:V2SF 107) (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S8 A64])) 2031 {*movv2sf_internal} (expr_list:REG_EQUAL (const_vector:V2SF [ (const_double:SF -QNaN [-QNaN]) repeated x2 ]) (nil))) with (insn 98 13 14 3 (set (reg:V8QI 112) (const_vector:V8QI [ (const_int -1 [0xffffffffffffffff]) repeated x8 ])) -1 (nil)) ... (insn 29 28 30 5 (set (reg:V2SF 107) (subreg:V2SF (reg:V8QI 112) 0)) 2031 {*movv2sf_internal} (expr_list:REG_EQUAL (const_vector:V2SF [ (const_double:SF -QNaN [-QNaN]) repeated x2 ]) (nil))) which leads to pr121015.c: In function ‘render_result_from_bake_h’: pr121015.c:34:1: error: unrecognizable insn: 34 | } | ^ (insn 98 13 14 3 (set (reg:V8QI 112) (const_vector:V8QI [ (const_int -1 [0xffffffffffffffff]) repeated x8 ])) -1 (expr_list:REG_EQUIV (const_vector:V8QI [ (const_int -1 [0xffffffffffffffff]) repeated x8 ]) (nil))) during RTL pass: ira Check all 0s/1s vectors with standard_sse_constant_p to avoid unsupported all 1s vectors. gcc/ PR target/121015 * config/i386/i386.cc (ix86_broadcast_inner): Check all 0s/1s vectors with standard_sse_constant_p. gcc/testsuite/ PR target/121015 * gcc.target/i386/pr121015.c: New test. Co-Developed-by: H.J. Lu <hjl.to...@gmail.com> -- H.J.
From 9c155cf3af65ce02e1494965e3cc7be609c2555c Mon Sep 17 00:00:00 2001 From: Uros Bizjak <ubiz...@gmail.com> Date: Mon, 14 Jul 2025 17:16:36 +0800 Subject: [PATCH v5] x86: Check all 0s/1s vectors with standard_sse_constant_p MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 77473a27bae04da99d6979d43e7bd0a8106f4557 Author: H.J. Lu <hjl.to...@gmail.com> Date: Thu Jun 26 06:08:51 2025 +0800 x86: Also handle all 1s float vector constant replaces (insn 29 28 30 5 (set (reg:V2SF 107) (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S8 A64])) 2031 {*movv2sf_internal} (expr_list:REG_EQUAL (const_vector:V2SF [ (const_double:SF -QNaN [-QNaN]) repeated x2 ]) (nil))) with (insn 98 13 14 3 (set (reg:V8QI 112) (const_vector:V8QI [ (const_int -1 [0xffffffffffffffff]) repeated x8 ])) -1 (nil)) ... (insn 29 28 30 5 (set (reg:V2SF 107) (subreg:V2SF (reg:V8QI 112) 0)) 2031 {*movv2sf_internal} (expr_list:REG_EQUAL (const_vector:V2SF [ (const_double:SF -QNaN [-QNaN]) repeated x2 ]) (nil))) which leads to pr121015.c: In function ‘render_result_from_bake_h’: pr121015.c:34:1: error: unrecognizable insn: 34 | } | ^ (insn 98 13 14 3 (set (reg:V8QI 112) (const_vector:V8QI [ (const_int -1 [0xffffffffffffffff]) repeated x8 ])) -1 (expr_list:REG_EQUIV (const_vector:V8QI [ (const_int -1 [0xffffffffffffffff]) repeated x8 ]) (nil))) during RTL pass: ira Check all 0s/1s vectors with standard_sse_constant_p to avoid unsupported all 1s vectors. gcc/ PR target/121015 * config/i386/i386.cc (ix86_broadcast_inner): Check all 0s/1s vectors with standard_sse_constant_p. gcc/testsuite/ PR target/121015 * gcc.target/i386/pr121015.c: New test. Co-Developed-by: H.J. Lu <hjl.to...@gmail.com> --- gcc/config/i386/i386-features.cc | 12 ++++----- gcc/testsuite/gcc.target/i386/pr121015.c | 32 ++++++++++++++++++++++++ 2 files changed, 37 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr121015.c diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc index 054f8d5ddc8..734ab70c108 100644 --- a/gcc/config/i386/i386-features.cc +++ b/gcc/config/i386/i386-features.cc @@ -3534,22 +3534,20 @@ ix86_broadcast_inner (rtx op, machine_mode mode, machine_mode *scalar_mode_p, x86_cse_kind *kind_p, rtx_insn **insn_p) { - if (op == const0_rtx || op == CONST0_RTX (mode)) + switch (standard_sse_constant_p (op, mode)) { + case 1: *scalar_mode_p = QImode; *kind_p = X86_CSE_CONST0_VECTOR; *insn_p = nullptr; return const0_rtx; - } - else if ((GET_MODE_CLASS (mode) == MODE_VECTOR_INT - && (op == constm1_rtx || op == CONSTM1_RTX (mode))) - || (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT - && float_vector_all_ones_operand (op, mode))) - { + case 2: *scalar_mode_p = QImode; *kind_p = X86_CSE_CONSTM1_VECTOR; *insn_p = nullptr; return constm1_rtx; + default: + break; } mode = GET_MODE (op); diff --git a/gcc/testsuite/gcc.target/i386/pr121015.c b/gcc/testsuite/gcc.target/i386/pr121015.c new file mode 100644 index 00000000000..57c8bff14ef --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr121015.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64-v3" } */ + +extern union { + int i; + float f; +} int_as_float_u; + +extern int render_result_from_bake_w; +extern int render_result_from_bake_h_seed_pass; +extern float *render_result_from_bake_h_primitive; +extern float *render_result_from_bake_h_seed; + +float +int_as_float(int i) +{ + int_as_float_u.i = i; + return int_as_float_u.f; +} + +void +render_result_from_bake_h(int tx) +{ + while (render_result_from_bake_w) { + for (; tx < render_result_from_bake_w; tx++) + render_result_from_bake_h_primitive[1] = + render_result_from_bake_h_primitive[2] = int_as_float(-1); + if (render_result_from_bake_h_seed_pass) { + *render_result_from_bake_h_seed = 0; + } + } +} -- 2.50.1