On Mon, Aug 5, 2024 at 5:50 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
> Very many thanks for the quick review and approval.  Here's another.
>
> This patch implements two improvements/refinements to the i386 backend's
> Scalar-To-Vector (STV) pass.  The first is to support memory destinations
> in binary logic operations, and the second is to provide more accurate
> costs/gains for (wide) immediate constants in binary logic operations.

Please do not mix together changes made for different reasons, as
advised in "Contributing to GCC" [1], section "Submitting Patches".

[1] https://gcc.gnu.org/contribute.html

Uros.

>
> A motivating example is gcc.target/i386/movti-2.c:
>
> __int128 m;
> void foo()
> {
>     m &= ((__int128)0x0123456789abcdefULL<<64) | 0x0123456789abcdefULL;
> }
>
> for which STV1 currently generates a warning/error:
> > r100 has non convertible use in insn 6
>
> (insn 5 2 6 2 (set (reg:TI 100)
>         (const_wide_int 0x123456789abcdef0123456789abcdef)) "movti-2.c":7:7
> 87 {
> *movti_internal}
>      (nil))
> (insn 6 5 0 2 (parallel [
>             (set (mem/c:TI (symbol_ref:DI ("m") [flags 0x2]  <var_decl
> 0x7f36d1c
> 27c60 m>) [1 m+0 S16 A128])
>                 (and:TI (mem/c:TI (symbol_ref:DI ("m") [flags 0x2]
> <var_decl 0x
> 7f36d1c27c60 m>) [1 m+0 S16 A128])
>                     (reg:TI 100)))
>             (clobber (reg:CC 17 flags))
>         ]) "movti-2.c":7:7 645 {*andti3_doubleword}
>      (expr_list:REG_DEAD (reg:TI 100)
>         (expr_list:REG_UNUSED (reg:CC 17 flags)
>             (nil))))
>
> and therefore generates the following scalar code with -O2 -mavx
>
> foo:    movabsq $81985529216486895, %rax
>         andq    %rax, m(%rip)
>         andq    %rax, m+8(%rip)
>         ret
>
> with this patch we now support read-modify-write instructions (as STV
> candidates), splitting them into explicit read-modify instructions
> followed by an explicit write instruction.  Hence, we now produce
> (when not optimizing for size):
>
> foo:    movabsq $81985529216486895, %rax
>         vmovq   %rax, %xmm0
>         vpunpcklqdq     %xmm0, %xmm0, %xmm0
>         vpand   m(%rip), %xmm0, %xmm0
>         vmovdqa %xmm0, m(%rip)
>         ret
>
> This code also handles the const_wide_int in example above, correcting
> the costs/gains when the hi/lo words are the same.  One minor complication
> is that the middle-end assumes (when generating memset) that SSE constants
> will be shared/amortized across multiple consecutive writes.  Hence to
> avoid testsuite regressions, we add a heuristic that considers an immediate
> constant to be very cheap, if that same immediate value occurs in the
> previous instruction or in the instruction after.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2024-08-05  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386-features.cc (timode_immed_const_gain): New
>         function to determine the gain/cost on a CONST_WIDE_INT.
>         (local_duplicate_constant_p): Helper function to see if the
>         same immediate constant appears in the previous or next insn.
>         (timode_scalar_chain::compute_convert_gain): Fix whitespace.
>         <case CONST_WIDE_INT>: Provide more accurate estimates using
>         timode_immed_const_gain and local_duplicate_constant_p.
>         <case AND>: Handle MEM_P (dst) and CONSTANT_SCALAR_INT_P (src).
>         (timode_scalar_to_vector_candidate_p): Support the first operand
>         of AND, IOR and XOR being MEM_P (i.e. a read-modify-write insn).
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/movti-2.c: Change dg-options to -Os.
>         * gcc.target/i386/movti-4.c: Expected output of original movti-2.c.
>
>
> Thanks again,
> Roger
> --
>

Reply via email to