On Tue, Jan 30, 2024 at 8:33 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch resolves PR113560, a code quality regression from GCC12
> affecting x86_64, by enhancing the middle-end's tree-ssa-math-opts.cc
> to recognize more instances of widening multiplications.
>
> The widening multiplication perception code identifies cases like:
>
>         _1 = (unsigned __int128) x;
>         __res = _1 * 100;
>
> but in the reported test case, the original input looks like:
>
>         _1 = (unsigned long long) x;
>         _2 = (unsigned __int128) _1;
>         __res = _2 * 100;
>
> which gets optimized by constant folding during tree-ssa to:
>
>         _2 = x & 18446744073709551615;  // x & 0xffffffffffffffff
>         __res = _2 * 100;
>
> where the BIT_AND_EXPR hides (has consumed) the extension operation.
> This reveals the more general deficiency (missed optimization
> opportunity) in widening multiplication perception that additionally
> both
>
> __int128 foo(__int128 x, __int128 y) {
>   return (x & 1000) * (y & 1000)
> }
>
> and
>
> unsigned __int128 bar(unsigned __int128 x, unsigned __int128) {
>   return (x >> 80) * (y >> 80);
> }
>
> should be recognized as widening multiplications.  Hence rather than
> test explicitly for BIT_AND_EXPR (as in the first version of this patch)
> the more general solution is to make use of range information, as
> provided by tree_non_zero_bits.
>
> As a demonstration of the observed improvements, function foo above
> currently with -O2 compiles on x86_64 to:
>
> foo:    movq    %rdi, %rsi
>         movq    %rdx, %r8
>         xorl    %edi, %edi
>         xorl    %r9d, %r9d
>         andl    $1000, %esi
>         andl    $1000, %r8d
>         movq    %rdi, %rcx
>         movq    %r9, %rdx
>         imulq   %rsi, %rdx
>         movq    %rsi, %rax
>         imulq   %r8, %rcx
>         addq    %rdx, %rcx
>         mulq    %r8
>         addq    %rdx, %rcx
>         movq    %rcx, %rdx
>         ret
>
> with this patch, GCC recognizes the *w and instead generates:
>
> foo:    movq    %rdi, %rsi
>         movq    %rdx, %r8
>         andl    $1000, %esi
>         andl    $1000, %r8d
>         movq    %rsi, %rax
>         imulq   %r8
>         ret
>
> which is perhaps easier to understand at the tree-level where
>
> __int128 foo (__int128 x, __int128 y)
> {
>   __int128 _1;
>   __int128 _2;
>   __int128 _5;
>
>   <bb 2> [local count: 1073741824]:
>   _1 = x_3(D) & 1000;
>   _2 = y_4(D) & 1000;
>   _5 = _1 * _2;
>   return _5;
> }
>
> gets transformed to:
>
> __int128 foo (__int128 x, __int128 y)
> {
>   __int128 _1;
>   __int128 _2;
>   __int128 _5;
>   signed long _7;
>   signed long _8;
>
>   <bb 2> [local count: 1073741824]:
>   _1 = x_3(D) & 1000;
>   _2 = y_4(D) & 1000;
>   _7 = (signed long) _1;
>   _8 = (signed long) _2;
>   _5 = _7 w* _8;
>   return _5;
> }
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?

Nice.  I'll note that the range check works on non-assign defs ('stmt')
as well, so can you put this outside of

       stmt = SSA_NAME_DEF_STMT (rhs);
       if (is_gimple_assign (stmt))
        {

and then of course, for

+                 /* X & MODE_MASK can be simplified to (T)X.  */
+                 if (gimple_assign_rhs_code (stmt) == BIT_AND_EXPR
+                     && TREE_CODE (gimple_assign_rhs2 (stmt)) == INTEGER_CST
+                     && wi::to_wide (gimple_assign_rhs2 (stmt))
+                        == wi::mask (hprec, false, prec))

add is_gimple_assign (stmt) in the condition?

In particular this might help to detect cases where the operand is defined
by a PHI node (aka a conditional).

OK with that change.

Thanks,
Richard.

>
> 2023-01-30  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/113560
>         * tree-ssa-math-opts.cc (is_widening_mult_rhs_p): Use range
>         information via tree_non_zero_bits to check if this operand
>         is suitably extended for a widening (or highpart) multiplication.
>         (convert_mult_to_widen): Insert explicit casts if the RHS or LHS
>         isn't already of the claimed type.
>
> gcc/testsuite/ChangeLog
>         PR target/113560
>         * g++.target/i386/pr113560.C: New test case.
>         * gcc.target/i386/pr113560.c: Likewise.
>
>
> Thanks in advance,
> Roger
> --
>

Reply via email to