On Fri, Dec 22, 2023 at 11:14 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch resolves the failure of pr43644-2.c in the testsuite, a code
> quality test I added back in July, that started failing as the code GCC
> generates for 128-bit values (and their parameter passing) has been in
> flux.  After a few attempts at tweaking pattern constraints in the hope
> of convincing reload to produce a more aggressive (but potentially
> unsafe) register allocation, I think the best solution is to use a
> peephole2 to catch/clean-up this specific case.
>
> Specifically, the function:
>
> unsigned __int128 foo(unsigned __int128 x, unsigned long long y) {
>   return x+y;
> }
>
> currently generates:
>
> foo:    movq    %rdx, %rcx
>         movq    %rdi, %rax
>         movq    %rsi, %rdx
>         addq    %rcx, %rax
>         adcq    $0, %rdx
>         ret
>
> and with this patch/peephole2 now generates:
>
> foo:    movq    %rdx, %rax
>         movq    %rsi, %rdx
>         addq    %rdi, %rax
>         adcq    $0, %rdx
>         ret
>
> which I believe is optimal.

How about simply moving the assignment to the MSB in the split pattern
after the LSB calculation:

  [(set (match_dup 0) (match_dup 4))
-   (set (match_dup 5) (match_dup 2))
   (parallel [(set (reg:CCC FLAGS_REG)
                  (compare:CCC
                    (plus:DWIH (match_dup 0) (match_dup 1))
                    (match_dup 0)))
             (set (match_dup 0)
                  (plus:DWIH (match_dup 0) (match_dup 1)))])
+   (set (match_dup 5) (match_dup 2))
   (parallel [(set (match_dup 5)
                  (plus:DWIH
                    (plus:DWIH

There is an earlyclobber on the output operand, so we are sure that
assignments to (op 0) and (op 5) won't clobber anything.
cprop_hardreg pass will then do the cleanup for us, resulting in:

foo: movq    %rdi, %rax
       addq    %rdx, %rax
       movq    %rsi, %rdx
       adcq    $0, %rdx

Uros.

>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-12-21  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/43644
>         * config/i386/i386.md (define_peephole2): Tweak register allocation
>         of *add<dwi>3_doubleword_concat_zext.
>
> gcc/testsuite/ChangeLog
>         PR target/43644
>         * gcc.target/i386/pr43644-2.c: Expect 2 movq instructions.
>
>
> Thanks in advance, and for your patience with this testsuite noise.
> Roger
> --
>
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 4c6368bf3b7..9f97d407975 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6411,13 +6411,13 @@ (define_insn_and_split 
"*add<dwi>3_doubleword_concat_zext"
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (match_dup 4))
-   (set (match_dup 5) (match_dup 2))
    (parallel [(set (reg:CCC FLAGS_REG)
                   (compare:CCC
                     (plus:DWIH (match_dup 0) (match_dup 1))
                     (match_dup 0)))
              (set (match_dup 0)
                   (plus:DWIH (match_dup 0) (match_dup 1)))])
+   (set (match_dup 5) (match_dup 2))
    (parallel [(set (match_dup 5)
                   (plus:DWIH
                     (plus:DWIH

Reply via email to