On Mon, Aug 5, 2024 at 5:50 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > Hi Uros, > Very many thanks for the quick review and approval. Here's another. > > This patch implements two improvements/refinements to the i386 backend's > Scalar-To-Vector (STV) pass. The first is to support memory destinations > in binary logic operations, and the second is to provide more accurate > costs/gains for (wide) immediate constants in binary logic operations.
Please do not mix together changes made for different reasons, as advised in "Contributing to GCC" [1], section "Submitting Patches". [1] https://gcc.gnu.org/contribute.html Uros. > > A motivating example is gcc.target/i386/movti-2.c: > > __int128 m; > void foo() > { > m &= ((__int128)0x0123456789abcdefULL<<64) | 0x0123456789abcdefULL; > } > > for which STV1 currently generates a warning/error: > > r100 has non convertible use in insn 6 > > (insn 5 2 6 2 (set (reg:TI 100) > (const_wide_int 0x123456789abcdef0123456789abcdef)) "movti-2.c":7:7 > 87 { > *movti_internal} > (nil)) > (insn 6 5 0 2 (parallel [ > (set (mem/c:TI (symbol_ref:DI ("m") [flags 0x2] <var_decl > 0x7f36d1c > 27c60 m>) [1 m+0 S16 A128]) > (and:TI (mem/c:TI (symbol_ref:DI ("m") [flags 0x2] > <var_decl 0x > 7f36d1c27c60 m>) [1 m+0 S16 A128]) > (reg:TI 100))) > (clobber (reg:CC 17 flags)) > ]) "movti-2.c":7:7 645 {*andti3_doubleword} > (expr_list:REG_DEAD (reg:TI 100) > (expr_list:REG_UNUSED (reg:CC 17 flags) > (nil)))) > > and therefore generates the following scalar code with -O2 -mavx > > foo: movabsq $81985529216486895, %rax > andq %rax, m(%rip) > andq %rax, m+8(%rip) > ret > > with this patch we now support read-modify-write instructions (as STV > candidates), splitting them into explicit read-modify instructions > followed by an explicit write instruction. Hence, we now produce > (when not optimizing for size): > > foo: movabsq $81985529216486895, %rax > vmovq %rax, %xmm0 > vpunpcklqdq %xmm0, %xmm0, %xmm0 > vpand m(%rip), %xmm0, %xmm0 > vmovdqa %xmm0, m(%rip) > ret > > This code also handles the const_wide_int in example above, correcting > the costs/gains when the hi/lo words are the same. One minor complication > is that the middle-end assumes (when generating memset) that SSE constants > will be shared/amortized across multiple consecutive writes. Hence to > avoid testsuite regressions, we add a heuristic that considers an immediate > constant to be very cheap, if that same immediate value occurs in the > previous instruction or in the instruction after. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2024-08-05 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386-features.cc (timode_immed_const_gain): New > function to determine the gain/cost on a CONST_WIDE_INT. > (local_duplicate_constant_p): Helper function to see if the > same immediate constant appears in the previous or next insn. > (timode_scalar_chain::compute_convert_gain): Fix whitespace. > <case CONST_WIDE_INT>: Provide more accurate estimates using > timode_immed_const_gain and local_duplicate_constant_p. > <case AND>: Handle MEM_P (dst) and CONSTANT_SCALAR_INT_P (src). > (timode_scalar_to_vector_candidate_p): Support the first operand > of AND, IOR and XOR being MEM_P (i.e. a read-modify-write insn). > > gcc/testsuite/ChangeLog > * gcc.target/i386/movti-2.c: Change dg-options to -Os. > * gcc.target/i386/movti-4.c: Expected output of original movti-2.c. > > > Thanks again, > Roger > -- >